Validate model: Poisson distribution (part 1)

The first model I tested is based on the predictive models of Maher [1] and Dixon / Coles [2]. Maher modelled the expected goals for a specific match as two independent Poisson distributions. After that, Dixon / Coles improved this model to balance some disadvantages.

In the previous post I described, how you can easily calculate the features of these models for any football match in the past. The first part of this post will show you, how to calculate the odds with the help of these features and why a simple Poisson distribution is not enough to beat the bookie. How I solved these problems will be the central element of the second part.

Continue reading “Validate model: Poisson distribution (part 1)”

Define variables: attack & defence strength

During my first investigations for predicting football scores I came across the predictive models of Maher [1] and Dixon / Coles [2]. Maher modelled the number of goals a team scores during a match as two independent Poisson distributed variables, for the home team and the away team. He assumed that each team has an attacking strength and a defence strength. Dixon / Coles extended this model by adjusting some disadvantages of the Poisson distribution and by using a time dependent attack and defence strength. Both papers are the base of my first predictive model.

In this Post I want to describe, how the attack and defence strength are calculated and how you add this calculation to the existing Data Vault model. The predictive model itself will be explained in another post.

Continue reading “Define variables: attack & defence strength”

Prepare data: football-data.co.uk (part 1)

In the post Gather data: football-data.co.uk I described, how you can load CSV data into the Exasol database. As the data is now available at the Stage Layer in the database, I must now prepare the data and persist it at the Raw Data Layer, so that I can easily use it for building predictive models.

With part 1 of this post I want to explain, what Data Vault modeling is and how the Data Vault model for the data structure of football-data.co.uk looks like. With part 2 I will explain, how you load data into the developed Data Vault model.

Continue reading “Prepare data: football-data.co.uk (part 1)”

Gather data: football-data.co.uk

When I started this project, my biggest problem was to find a source for historic football statistics and historic football odds. Fortunately, I found Joseph Buchdahl’s website football-data.co.uk. This website is just great! He offers CSV files for 22 football leagues and about 19 seasons. He updates the data mostly two times a week. So I used this data as the starting point for my analytical system.

Continue reading “Gather data: football-data.co.uk”

Analytical Architecture: TripleA DWH

As described in How to beat the bookie: Value Betting I want to use Value Betting to beat the bookie. To identify value, I have to be able to calculate the probability of a specific sports event (e.g. Home-Win for Team A) as accurately as possible. Therefor, I have to develop, test, simulate and process different predictive models.  As a DWH architect I know, that a good data architecture helps a lot to support such a developing process. That’s why I formed the concept of the TripleA DWH – the Advanced Agile Analytical Data Warehouse – a data architecture aimed to automate data science processes.

Continue reading “Analytical Architecture: TripleA DWH”

How to beat the bookie: Value Betting

There are many ways to beat a bookie. One of the well known methods is arbitrage betting, where you try to find price differences between different bookies. Some years ago this was a really good method, but today, as every single information about sports is available throw the internet, it is hard to find difference between bookies.

I will mainly focus on the so-called Value Betting. But what is Value Betting and how does it work?

Continue reading “How to beat the bookie: Value Betting”