Poisson vs Reality

The Poisson distribution is widely used to predict the result of a football matches. Multiple articles can be found in the internet and I also already provided a comparison of different Vanilla Poisson models. But the Poisson distribution as some limitations. The Poisson distribution assumes the number of goals a team scores are independent. But everybody watching football knows, that a team being one goals behind is way more motivated to score a goal in comparison to being already 4 goals behind. So let’s have a look how a simple Poisson distribution compares to the actual scored goals.

Continue reading “Poisson vs Reality”

Why is it so hard to beat the Bookie?

Image you see following picture for two different profit lines. Both betting simulations are based on the same stacking method: Each identified value bet is set with 1 unit flat stack. Which of both simulation would you prefer? I think the answer is easy.

Of course everybody would prefer the green proft line. But both profit lines are based on the same predictive model. All predictions and bet selections are based on the EMA10 Vanilla Poisson xG model, which I already used for multiple blogs.
The difference between both lines: The yellow line represents the betting profit, when betting against the provided odds of a bookie. The green line represent the betting profit, when betting against a bookie without the bookie margin. This bookmaker margin eat up the whole advantage of the model.

Continue reading “Why is it so hard to beat the Bookie?”

Scoring functions vs. betting profit – Measuring the performance of a football betting model

“What’s the best model?” – That’s a very important questions, when creating, training and testing new predictive models for football. Various machine learning algorithms and packages offer by default a set of scoring functions like accuracy, log-loss, brier score or ROC-AUC, which measure the accuracy of a probabilistic prediction. But I already recognized in older posts, that the best model based on a scoring function, was not always the best model, when it’s about using the prediction results for betting. So let’s have a look and compare the rank of some scoring functions in comparison to the betting profit of some models.

Continue reading “Scoring functions vs. betting profit – Measuring the performance of a football betting model”

Using xG & advanced stats to predict football matches

With the BeatTheBookieDataService in place it’s also time to provide some new models. This post will take a look at possible models using the team statistics provided for each match by understat.com. Therefor I will compare 3 of the most used machine learning algorithms. Beside this, it’s also time to test again some basics for predictiv modeling for football: “To differ between home/away performance or not to differ”? For my Poisson models I always differed between home and away performance. But is this also needed, when using ML algorithms?

Continue reading “Using xG & advanced stats to predict football matches”

The predictive power of xG

“Goals are the only statistic, which decide a match” – sentences like this appeared not only once, while reading discussion about the latest xG statistics of single matches on Twitter. Even if the statistic xG is more and more used by sport journalists and during broadcasts, the meaning and importance of the statistic is not yet widely understood. This might be caused by the usage of xG for single matches or single shots. The final result of a match and the corresponding xG values might differ a lot. But over the long-term xG is a statistic, which tells us way more about a football team than goals and shots alone. To prove this, this post will take a look at the predictive power of xG in comparison to goals. The more information a statistic contains the more it should help us to predict the result of future matches.

Continue reading “The predictive power of xG”

Automate your betting models with AWS

How does my typical betting weekend looks like, when I start ckecking, whether there are some interesting matches? I start my laptop, open the browser, start my Python program, start the database and after some minutes, I am able to start my data prcoessing, which collects all the data and calculates the predictions. That’s already great, but wouldn’t it be even better to have all predictions always already up-to-date? This blog will show you how to setup and run a small automated data pipeline in AWS, which extracts all stats from Understat.com.

Continue reading “Automate your betting models with AWS”

Why every data scientist should learn SQL

It’s been quite a long time since my last post for my blog. But that has been because of a specific reason: I participated at the 2nd DFB Hackathon, which consumed a huge amount of my freetime, which I normally spent creating some content for my blog. The Hackathon was again a great experience as all this deep data science stuff is still a challenge for me. But there’s again on big question on my side: Why are data scientist often just using Python (or R) and don’t know, how and when to use SQL.

Continue reading “Why every data scientist should learn SQL”

Migrating Exasol Community Edition

In one of my older posts I described the data architecture, I am using for all my examples. As the database I use the Exasol Community Edition. From time to time it is necessary to update your software to the current version because of new features. This post will describe, how to migrate a Exasol community edition to anther one. These steps can also be used, to migrate nearly every database to an Exasol.

Continue reading “Migrating Exasol Community Edition”