# Poisson vs Reality

The Poisson distribution is widely used to predict the result of a football matches. Multiple articles can be found in the internet and I also already provided a comparison of different Vanilla Poisson models. But the Poisson distribution as some limitations. The Poisson distribution assumes the number of goals a team scores are independent. But everybody watching football knows, that a team being one goals behind is way more motivated to score a goal in comparison to being already 4 goals behind. So let’s have a look how a simple Poisson distribution compares to the actual scored goals.

For this comparison I used all matches of the seasons 2017 – 2022 and the Big5 european leagues. The goal distributions are based on the actual scored goals in these matches. The Poisson distribution values are based on the EMA10 xG Vanilla Poisson model and the average predicted probabilities for the different amount of goals. For example the probability of 3 goals during a match is the sum of all single results (3:0, 2:1, 1:2, 0:3).

## Poisson distribution vs. Overall goal distribution

We start taking a look at the overall distributions and go more and more into details, to determine the disadvantages of a Poisson model. The overall distributions generally look really similar. For zero goals the predicted probability is a bit higher than the actual amount of times a match ended without goals. About 7% of all matches ended without a goal scored. The model predicted 8% of the matches ended without a goal. A even slightly higher overestimation happened for matches with 1 goal. On average in 17 of 100 matches the fans saw just a single goal for the home or away team. Based on the predicted Poisson distribution it should happen on average in 19 of 100 matches. That’s a typical problem using the Poisson distribution for predicting football matches.

## Poisson distribution vs. Home/Away goal distribution

Differing between home and away goal distribution indicates the source of this problem. Generally the Poisson model overestimates the probaility a single team scoring no goal. For the home team it’s an overestimation of 3%. For the away team it’s 2.6%. We can also spot a overestimation of 2.2% for the home team scoring one goal.

## Poisson distribution vs. Division/Season distribution

Going one step further, we should also take a look at single divisions. Each division has it’s own style of football. The German Bundesliga is know to have the highest average amount of goals scored. There’s a constant overestimation of low-scoring games (0:0, 1:0, 0:1). The biggest difference can be spotted in the season 20/21. The model assumed, 18% of the matches end with 1 goal. I reality it were just 12% of all matches. For lower scoring leagues like the LaLiga, the picture look different. In the season 21/22 there was even a underestimation of 3% for matches without any goal. The higher the average number of goals scored is, the more a Vanilla Poisson model tends to overestimate the number of low scoring matches. For lower average numbers of goals it changes to an underestimations.

## Poisson distribution vs. result distribution

We can again prove the determined behaviour also by taking a look at the single result distribution. As this is about all divisons and seasons, the picture again shows a trend of a general overestimation of low scoring matches. Matches with a final score of 0:0, 1:0 or 0:1 are overestimated. As soon as 2 or more goals are scored, this changes to a slight underestimation.

## Conclusions

The Poisson distribution is easy to use and already shows a good performance in predicting football matches. But it also has it flaws. For lower average scores, it tends to underestimate low scoring results. For higher averager scores, we are able to identify an overestimation. That’s something you have to be aware of. And it’s the chance to increase the accuracy of your predictions by inflating or deflating [1] the probability for these cases.

If you like to replicate the results and take a closer look, all data used for the different analyses is available via the BeatTheBookie data service.