When a rich club in Germany goes through a bad performance phase or loses an important match, we like to use the phrase “Geld schießt eben keine Tore”. What means more or less, that big money doesn’t ensure goals. But the overall acceptance is of course, that richer clubs are expected to win more often as they got the money to buy the best players. This inspired me to start a data journey about market values in the big 5 European leagues: What do the market values tell about the development in the different leagues? How do teams perform in relation to the money they spent? Does the market value of a team has a predictive significance?
As Exasol changed the way, how to build a customer Python3 docker container, this instructions no longer work. I have to update the single steps.
There is one big reason, why I have chosen Exasol as a database for my football analytics and predictions: Exasol is capable of executing Python and R code inside the database. Your are able to put your statistical calculations and predictive models to your data. The feature User Defined Functions (UDFs) provides the possibility to implement every logic which you normally code in Python or R. This is a really efficient way to extent plain SQL with some predictive functionality like the execution of TensorFlow models.
In this blog post I will explain, how you extend the Exasol community edition with all needed Python3 packages to execute Tensorflow models.
As mentioned in the last post, I am now going to use TensorFlow to build my first own predictive model. But before, there are several small steps, which need to be taken. At first I want to explain, how your able to read and write data via a Python script into Exasol. This is needed to read the different predictive variables and write back results of a prediction into the database when developing models.
The first 10 matchdays of the current season in the Bundesliga revealed some clear disadvantages of my Poisson model. The predictor variables attack and defence strength respond too slowly to performance changes of single teams. This was clearly shown by the loss produced by the poorly performing FC Cologne. A normal SMA (simple moving average) does not use a weight. So latest results, which represent the current form, have not a higher priority over older results. As I looked for solution for this problem I stumbled over the EMA (exponential moving average). This post will explain the use of the EMA and how you can implement it inside the Exasol, so that it is usable as an analytical function for the predictor variables. On top I will show you, how you can analyse the team performance with help of MACD (Moving Average Convergence/Divergence oscillator ).
In the first part Prepare data: football-data.co.uk (part 1) I described how the Data Vault model for the data of football-data.co.uk looks like. In the second part I will now focus on loading data into the Data Vault model. With the overall analytical architecture in mind this equates the data integration process between the stage layer and the raw data layer.
In the post Gather data: football-data.co.uk I described, how you can load CSV data into the Exasol database. As the data is now available at the Stage Layer in the database, I must now prepare the data and persist it at the Raw Data Layer, so that I can easily use it for building predictive models.
With part 1 of this post I want to explain, what Data Vault modeling is and how the Data Vault model for the data structure of football-data.co.uk looks like. With part 2 I will explain, how you load data into the developed Data Vault model.
When I started this project, my biggest problem was to find a source for historic football statistics and historic football odds. Fortunately, I found Joseph Buchdahl’s website football-data.co.uk. This website is just great! He offers CSV files for 22 football leagues and about 19 seasons. He updates the data mostly two times a week. So I used this data as the starting point for my analytical system.
As described in How to beat the bookie: Value Betting I want to use Value Betting to beat the bookie. To identify value, I have to be able to calculate the probability of a specific sports event (e.g. Home-Win for Team A) as accurately as possible. Therefore, I want to develop, test, simulate and process different predictive models. As a DWH architect I know, that a good system architecture helps a lot to support such a developing process. That’s why I formed the concept of the TripleA DWH – the Advanced Agile Analytical Data Warehouse.