I decided to additionally share my sources, which I use to build BeatTheBookie. So everybody is able to re-use them and build his own analytical system. I added a new page to the blog, where you can find the link to the GitHub repository.
The first model I tested is based on the predictive models of Maher  and Dixon / Coles . Maher modelled the expected goals for a specific match as two independent Poisson distributions. After that, Dixon / Coles improved this model to balance some disadvantages.
In the previous post I described, how you can easily calculate the features of these models for any football match in the past. The first part of this post will show you, how to calculate the odds with the help of these features and why a simple Poisson distribution is not enough to beat the bookie. How I solved these problems will be the central element of the second part.
During my first investigations for predicting football scores I came across the predictive models of Maher  and Dixon / Coles . Maher modelled the number of goals a team scores during a match as two independent Poisson distributed variables, for the home team and the away team. He assumed that each team has an attacking strength and a defence strength. Dixon / Coles extended this model by adjusting some disadvantages of the Poisson distribution and by using a time dependent attack and defence strength. Both papers are the base of my first predictive model.
In this Post I want to describe, how the attack and defence strength are calculated and how you add this calculation to the existing Data Vault model. The predictive model itself will be explained in another post.
In the first part Prepare data: football-data.co.uk (part 1) I described how the Data Vault model for the data of football-data.co.uk looks like. In the second part I will now focus on loading data into the Data Vault model. With the overall analytical architecture in mind this equates the data integration process between the stage layer and the raw data layer.
In the post Gather data: football-data.co.uk I described, how you can load CSV data into the Exasol database. As the data is now available at the Stage Layer in the database, I must now prepare the data and persist it at the Raw Data Layer, so that I can easily use it for building predictive models.
With part 1 of this post I want to explain, what Data Vault modeling is and how the Data Vault model for the data structure of football-data.co.uk looks like. With part 2 I will explain, how you load data into the developed Data Vault model.
When I started this project, my biggest problem was to find a source for historic football statistics and historic football odds. Fortunately, I found Joseph Buchdahl’s website football-data.co.uk. This website is just great! He offers CSV files for 22 football leagues and about 19 seasons. He updates the data mostly two times a week. So I used this data as the starting point for my analytical system.
No matter what predictive model you want to build, you have to go through several steps. You find many different approaches to describe such a development process for statistical models or predictive models in the internet. I have chosen a relative simple one, which is based on papers for a SAS training.
As described in How to beat the bookie: Value Betting I want to use Value Betting to beat the bookie. To identify value, I have to be able to calculate the probability of a specific sports event (e.g. Home-Win for Team A) as accurately as possible. Therefor, I have to develop, test, simulate and process different predictive models. As a DWH architect I know, that a good data architecture helps a lot to support such a developing process. That’s why I formed the concept of the TripleA DWH – the Advanced Agile Analytical Data Warehouse – a data architecture aimed to automate data science processes.
While learning something about sports betting, it is essential to compile betting odds to probabilities and probabilities to betting odds.
There are many ways to beat a bookie. One of the well known methods is arbitrage betting, where you try to find price differences between different bookies. Some years ago this was a really good method, but today, as every single information about sports is available throw the internet, it is hard to find difference between bookies.
I will mainly focus on the so-called Value Betting. But what is Value Betting and how does it work?