When a rich club in Germany goes through a bad performance phase or loses an important match, we like to use the phrase “Geld schießt eben keine Tore”. What means more or less, that big money doesn’t ensure goals. But the overall acceptance is of course, that richer clubs are expected to win more often as they got the money to buy the best players. This inspired me to start a data journey about market values in the big 5 European leagues: What do the market values tell about the development in the different leagues? How do teams perform in relation to the money they spent? Does the market value of a team has a predictive significance?
During my first investigations for predicting football scores I came across the predictive models of Maher  and Dixon / Coles . Maher modelled the number of goals a team scores during a match as two independent Poisson distributed variables, for the home team and the away team. He assumed that each team has an attacking strength and a defence strength. Dixon / Coles extended this model by adjusting some disadvantages of the Poisson distribution and by using a time dependent attack and defence strength. Both papers are the base of my first predictive model.
In this Post I want to describe, how the attack and defence strength are calculated and how you add this calculation to the existing Data Vault model. The predictive model itself will be explained in another post.
In the first part Prepare data: football-data.co.uk (part 1) I described how the Data Vault model for the data of football-data.co.uk looks like. In the second part I will now focus on loading data into the Data Vault model. With the overall analytical architecture in mind this equates the data integration process between the stage layer and the raw data layer.
In the post Gather data: football-data.co.uk I described, how you can load CSV data into the Exasol database. As the data is now available at the Stage Layer in the database, I must now prepare the data and persist it at the Raw Data Layer, so that I can easily use it for building predictive models.
With part 1 of this post I want to explain, what Data Vault modeling is and how the Data Vault model for the data structure of football-data.co.uk looks like. With part 2 I will explain, how you load data into the developed Data Vault model.
As described in How to beat the bookie: Value Betting I want to use Value Betting to beat the bookie. To identify value, I have to be able to calculate the probability of a specific sports event (e.g. Home-Win for Team A) as accurately as possible. Therefor, I have to develop, test, simulate and process different predictive models. As a DWH architect I know, that a good data architecture helps a lot to support such a developing process. That’s why I formed the concept of the TripleA DWH – the Advanced Agile Analytical Data Warehouse – a data architecture aimed to automate data science processes.