During my first investigations for predicting football scores I came across the predictive models of Maher  and Dixon / Coles . Maher modelled the number of goals a team scores during a match as two independent Poisson distributed variables, for the home team and the away team. He assumed that each team has an attacking strength and a defence strength. Dixon / Coles extended this model by adjusting some disadvantages of the Poisson distribution and by using a time dependent attack and defence strength. Both papers are the base of my first predictive model.
In this Post I want to describe, how the attack and defence strength are calculated and how you add this calculation to the existing Data Vault model. The predictive model itself will be explained in another post.
Continue reading “Define variables: attack & defence strength”
In the first part Prepare data: football-data.co.uk (part 1) I described how the Data Vault model for the data of football-data.co.uk looks like. In the second part I will now focus on loading data into the Data Vault model. With the overall analytical architecture in mind this equates the data integration process between the stage layer and the raw data layer.
Continue reading “Prepare data: football-data.co.uk (part 2)”
In the post Gather data: football-data.co.uk I described, how you can load CSV data into the Exasol database. As the data is now available at the Stage Layer in the database, I must now prepare the data and persist it at the Raw Data Layer, so that I can easily use it for building predictive models.
With part 1 of this post I want to explain, what Data Vault modeling is and how the Data Vault model for the data structure of football-data.co.uk looks like. With part 2 I will explain, how you load data into the developed Data Vault model.
Continue reading “Prepare data: football-data.co.uk (part 1)”
When I started this project, my biggest problem was to find a source for historic football statistics and historic football odds. Fortunately, I found Joseph Buchdahl’s website football-data.co.uk. This website is just great! He offers CSV files for 22 football leagues and about 19 seasons. He updates the data mostly two times a week. So I used this data as the starting point for my analytical system.
Continue reading “Gather data: football-data.co.uk”
No matter what predictive model you want to build, you have to go through several steps. You find many different approaches to describe such a development process for statistical models or predictive models in the internet. I have chosen a relative simple one, which is based on papers for a SAS training.
Continue reading “How To: Develop predictive models”
As described in How to beat the bookie: Value Betting I want to use Value Betting to beat the bookie. To identify value, I have to be able to calculate the probability of a specific sports event (e.g. Home-Win for Team A) as accurately as possible. Therefor, I have to develop, test, simulate and process different predictive models. As a DWH architect I know, that a good data architecture helps a lot to support such a developing process. That’s why I formed the concept of the TripleA DWH – the Advanced Agile Analytical Data Warehouse – a data architecture aimed to automate data science processes.
Continue reading “Analytical Architecture: TripleA DWH”
While learning something about sports betting, it is essential to compile betting odds to probabilities and probabilities to betting odds.
Continue reading “How to: convert betting odds”
There are many ways to beat a bookie. One of the well known methods is arbitrage betting, where you try to find price differences between different bookies. Some years ago this was a really good method, but today, as every single information about sports is available throw the internet, it is hard to find difference between bookies.
I will mainly focus on the so-called Value Betting. But what is Value Betting and how does it work?
Continue reading “How to beat the bookie: Value Betting”