“Goals are the only statistic, which decide a match” – sentences like this appeared not only once, while reading discussion about the latest xG statistics of single matches on Twitter. Even if the statistic xG is more and more used by sport journalists and during broadcasts, the meaning and importance of the statistic is not yet widely understood. This might be caused by the usage of xG for single matches or single shots. The final result of a match and the corresponding xG values might differ a lot. But over the long-term xG is a statistic, which tells us way more about a football team than goals and shots alone. To prove this, this post will take a look at the predictive power of xG in comparison to goals. The more information a statistic contains the more it should help us to predict the result of future matches.Continue reading “The predictive power of xG”
In the 3rd part of this series, I take a look at the – from my point of view – most important part about my market value data journey: Does the team market value holds some predictive power? If so, I could use it as another feature for my predictive models.
Part one defined the basic architecture of the Team Strength MLP (multi layer perceptron). The training process and its monitoring via Tensorboard was explained in part two. Now it is time to take a look at the prediction of football matches. Primarily this consists of following steps:
- Load the prediction data set
- Re-build neural network architecture and load pre-trained weights
- Execute prediction
The Bundesliga season 2017/18 will be the test case for this example. The season 2008 – 2016 were used to train the mode.
The first part of this series covered the definition of the network architecture for my Team Strength MLP. This neural network must now be trained. To explain and visualize the training process, Tensorflow offers the web frontend TensorBoard. This post will explain, how you use TensorBoard and what are some basic indicators for a well-trained model.
It is time to build and test my first predictive model with Tensorflow! As I am currently totally unexperienced in creating and optimizing neural networks, I will start with a very simple one, which just uses the predictive variables of the Poisson model. By doing this, I will be able to compare the resulting network with the Poisson model. I am excited to see, whether Tensorflow is able to outperform this statistical model with such a low number of predictive variables. In this series I will provide some basic information, how you are able to build a simple multilayer perceptron (MLP) with Tensorflow, supervise the training process with Tensorboard and use the trained neural network to predict the outcomes of the matches.
Some days ago I read an interesting article about how bookies arrange their margin to the possible outcomes. All bookies keep this of course secret as this offers them a specific range, where they can shorten or lengthen the odds depended on the amounts of placed bets. But I need this information, because I simulate my prediction models for back and lay markets, with just the odds of the back markets. This post will explain, how you should calculate the bookie margin, and how you should not do it. I handled this topic a little bit naively during the development of my Poisson model, which causes some problems.
The first and the second part of this series explained some basic methods to optimise the regression models for the GS & PPG match rating. You have now a set of 3 different regression models (linear, polynomial and polynomial without outliers) for each predictive variable. These models now have to not only compete against each other, but also of course against the Bookie odds and the Poisson prediction model.
The first part of this series took a look at the GS match rating model. The post described, how you are able to identify a non-linear relationship between the predictor variables and the outcome variable. The same methods will now be applied to the PPG match rating model, so that we are able to compare the two different polynomial regression models. On top, I want to show, how you are able to figure out, whether outliers in your data have an influence on your regression model.
In the last post I described, how the features for the GS & PPG match rating models are calculated. Based on these features I will now describe, how you build and optimise a linear regression model with R. The first part will describe the optimisation of the linear regression model for the GS match rating model in detail. The second part will cover the PPG match rating model. The third and final part will compare the prediction performance of the different models.
In the first part of this post I described, how a Poisson distribution can be used to predict football scores and why it is not sufficient to beat the bookie. The second part will now explain, how I balanced the disadvantages of the poisson distribution. This turned the model to an efficient predictive model, which can be used to gain profit against the bookie.