In the 3rd part of this series, I take a look at the – from my point of view – most important part about my market value data journey: Does the team market value holds some predictive power? If so, I could use it as another feature for my predictive models.
The first and the second part of this series explained some basic methods to optimise the regression models for the GS & PPG match rating. You have now a set of 3 different regression models (linear, polynomial and polynomial without outliers) for each predictive variable. These models now have to not only compete against each other, but also of course against the Bookie odds and the Poisson prediction model.
The first part of this series took a look at the GS match rating model. The post described, how you are able to identify a non-linear relationship between the predictor variables and the outcome variable. The same methods will now be applied to the PPG match rating model, so that we are able to compare the two different polynomial regression models. On top, I want to show, how you are able to figure out, whether outliers in your data have an influence on your regression model.
In the last post I described, how the features for the GS & PPG match rating models are calculated. Based on these features I will now describe, how you build and optimise a linear regression model with R. The first part will describe the optimisation of the linear regression model for the GS match rating model in detail. The second part will cover the PPG match rating model. The third and final part will compare the prediction performance of the different models.
In the last post I described the predictive models, which will be explained in this series. Following the development process for predictive models, the next steps should handle the raw data supply for the predictive models. Fortunately football-data.co.uk already offers all data, which is needed for these models. So this post will explain, how you implement the features for the GS and PPG match rating models based on the existing Raw Data Vault model.
This post will be the start of a new series, where I explain, how to implement another predictive model at the TripleA DWH architecture. When starting developing predictive models with R, I was a little bit overstrained by the different plots provided by R, which can be used to analyse and optimize your predictive model. That’s why I wanted to learn and understand the whole optimizing process in R on base of a simple predictive model. Football-data.co.uk provides an explanation for a small rating system, which uses a linear regression to predict the probability for a home-win, draw or away win. I have chosen this linear regression model, as linear regression is a frequent used and easy to understand predictive method. With a linear regression you can investigate the relationship of the variable, which should be predicted, and one or more features.