In the first part of this data journey, I took a look at the general definition of expected goals (xG) and the usage of this metric. In the next step in the process of testing the predictive power of xG, I need to get some data. This part will focus on getting the team expected goals statistics. In one of the following parts, I will also take a look on getting the player expected goals statistics as this of course offers even deeper insights.
When a rich club in Germany goes through a bad performance phase or loses an important match, we like to use the phrase “Geld schießt eben keine Tore”. What means more or less, that big money doesn’t ensure goals. But the overall acceptance is of course, that richer clubs are expected to win more often as they got the money to buy the best players. This inspired me to start a data journey about market values in the big 5 European leagues: What do the market values tell about the development in the different leagues? How do teams perform in relation to the money they spent? Does the market value of a team has a predictive significance?
The hardest part of sports analytics is getting data! Not for nothing there are companies, which earn their money just with sports data. But if you are not able or do not want to pay such amounts of money, you got just one possibility: scraping the data from the Web. In an older post, I described a R web scraper. As this one was no longer working, I needed a new one. What brings us to this post. This time I will describe, how to create a web scrapper for static HTML sites with Python and how you are able to implement such a web scrapper as a User Defined Function (UDF) in Exasol.
There is one big reason, why I have chosen Exasol as a database for my football analytics and predictions: Exasol is capable of executing Python and R code inside the database. Your are able to put your statistical calculations and predictive models to your data. The feature User Defined Functions (UDFs) provides the possibility to implement every logic which you normally code in Python or R. This is a really efficient way to extent plain SQL with some predictive functionality like the execution of TensorFlow models.
In this blog post I will explain, how you extend the Exasol community edition with all needed Python3 packages to execute Tensorflow models. Additionally with the latest update I also added the packages and description needed for all my web scrapping scripts.
Part one defined the basic architecture of the Team Strength MLP (multi layer perceptron). The training process and its monitoring via Tensorboard was explained in part two. Now it is time to take a look at the prediction of football matches. Primarily this consists of following steps:
- Load the prediction data set
- Re-build neural network architecture and load pre-trained weights
- Execute prediction
The Bundesliga season 2017/18 will be the test case for this example. The season 2008 – 2016 were used to train the mode.
The first part of this series covered the definition of the network architecture for my Team Strength MLP. This neural network must now be trained. To explain and visualize the training process, Tensorflow offers the web frontend TensorBoard. This post will explain, how you use TensorBoard and what are some basic indicators for a well-trained model.
It is time to build and test my first predictive model with Tensorflow! As I am currently totally unexperienced in creating and optimizing neural networks, I will start with a very simple one, which just uses the predictive variables of the Poisson model. By doing this, I will be able to compare the resulting network with the Poisson model. I am excited to see, whether Tensorflow is able to outperform this statistical model with such a low number of predictive variables. In this series I will provide some basic information, how you are able to build a simple multilayer perceptron (MLP) with Tensorflow, supervise the training process with Tensorboard and use the trained neural network to predict the outcomes of the matches.
As mentioned in the last post, I am now going to use TensorFlow to build my first own predictive model. But before, there are several small steps, which need to be taken. At first I want to explain, how your able to read and write data via a Python script into Exasol. This is needed to read the different predictive variables and write back results of a prediction into the database when developing models.
After gaining much experience over a complete season, it is time to set myself some new goals. Until now I just used or tested predictive models, which were invented or described by other people. Now I want to try something new. I would like to create my first own predictive model, which should of course provide a better performance as the current Poisson model. This is where Tensorflow comes into play.
The first 10 matchdays of the current season in the Bundesliga revealed some clear disadvantages of my Poisson model. The predictor variables attack and defence strength respond too slowly to performance changes of single teams. This was clearly shown by the loss produced by the poorly performing FC Cologne. A normal SMA (simple moving average) does not use a weight. So latest results, which represent the current form, have not a higher priority over older results. As I looked for solution for this problem I stumbled over the EMA (exponential moving average). This post will explain the use of the EMA and how you can implement it inside the Exasol, so that it is usable as an analytical function for the predictor variables. On top I will show you, how you can analyse the team performance with help of MACD (Moving Average Convergence/Divergence oscillator ).