Implement Model: Poisson distribution

In the last post the prototype of the Poisson prediction model has proven, that the optimised model is suitable to beat the bookie – at least for the German Bundesliga. The next step in the predictive model development process consists of implementing the model for forecasting the current fixtures. Regarding this model this part is very easy, as you need not to implement a trained model, just the prediction logic.

Current fixtures

At first you need the information, what are the current fixtures, which you want to predict. The imported data from football-data.co.uk just contains the historic data. So you need another data source for the current fixtures. There are 2 possibilities:

Automatic data source:

The best option of course would be an internet data source, which provides the current fixtures and which can be imported into the database. Football-data.co.uk provides also a CSV file with the current fixtures. Unfortunately the file gets only update at late friday afternoon. This is a little bit late, if you do not always have time to select your picks at this time.

Manual data source:

So I had chosen the second option to manually maintain a CSV file with the current fixtures. This file gets imported during the data processing and is integrated into the whole data vault model. The CSV file should of course contain the same team names, which are provided by the historical data. Therefor I create a report, which lists all current teams for each division:

GitHub – Teams per division report

After that I use Excel file with a drop down selection to create the CSV file for each matchday.

GitHub – Excel report current matches

The stage interface table and the import process look like this:

GitHub – Current matches interface table structure

GutHub – Interface current matches import process

As the same team names are used for the manual interface, the current matches can be easily implemented as an additional link table into the existing Raw Data Vault model:

GitHub – Current matches link table structure

GitHub – Current matches link table loading process

The table structure and the loading process for that link table look the same as described for the historic data. Again, this is one of the clear advantages using a Data Vault to model the data.

This is of course not really a good solution. But currently it works for me, as I only have to maintain the current fixtures for the German Bundesliga. But creating a new interface and extract the data from an internet source has a high priority in my backlog.

Feature implementation

After the current match data is integrated into the Raw Data Vault model, the calculation of the different features has to be implemented for the new link table. This happens also at the analytical layer, where the historic features are implemented. At this point, you can again use the advantages of a Data Vault model. As both, the current and historic matches, are link tables, the definition of the historic features can easily be adapted for the current matches:

GitHub – Attack defence strength for current matches

If you compare the historic with the current attack / defence strength, you can see mainly two differences: At line 27 the source for the match combinations was replaced by the link table for the current matches. At line 78 and line 150 the current match link table gets used to determine the date list for which the home and away features should be calculated. The same applies to the eventbased probs.

GitHub – Eventbased probs for current matches

The data, which is needed to calculate the features for the current matches, is much less than for the historic matches. That’s why, I implemented the prediction features as views. This has the advantage, that you have to load the data for prediction only into the Raw Data Vault. After that, everything gets virtualized.

Forecast implementation

It is not necessary to write much about the implementation of the ZIP probabilities. You just need to adapt the view, which delivers the features for the prediction –  that’s all.

GitHub – ZIP probs for current matches

Prediction model report

The last step of implementing the prediction model consists in providing a report, which summarises all the information needed for value betting based on the Poisson prediction model. As mentioned I do not have an interface, which provides all current matches and the corresponding betting odds. That’s why the report contains only following information:

  • Match information (date, home team, away team)
  • Betting market
  • Expected goals
  • ZIP probability / odds
  • Eventbased probability / odds (Historic & direct)
  • Fair probability / odds (ZIP probability + historic probabilty as correction factor)
  • Empty fields for market probability / odds and value

The SQL statement for the report looks like this:

GitHub – poisson value betting report

The report consists of single select statements for each relevant betting market, which get united. The market odds need to be added manually to each bet. Therefor I created an Excel file, where I can add the market odds and which calculates the value of each bet.

GitHub – Poisson value betting excel template

The template consists of 2 pages. At the first one, you add the data from the report. At line 2 you can find the formulas to calculate the value of a bet and the betting advice. So you just need to copy all data to the report and fill in the markets odds. The second page is just to track your placed bets.

Imported for the value detection for a single bet are just the fair odds and the market odds. All other information is just added, to get a feeling for the match. So you got two options. The whole simulation of the poisson prediction model was done by plain value detection. No expert knowledge was used to select specific bets. This is represented by the betting advice in column S. But of course there could be e.g. a game with just a small value for the away team and the direct probabilities indicate, that the away team never wins when playing against the home side. Then you could of course exclude the bet by using your expert knowledge, even if the betting advice tells you to place a bet.

 

At this point, we have reached the final point of the development process for the Poisson prediction model. I’m currently using this prediction model for about 3 months and will also continue using it, until I have found a better prediction model for the German Bundesliga. I hope you enjoyed this series. If you have any suggestions for improvement, please leave a comment. Soon I will start another series, where I explain another prediction model. But on the side I will also publish some other posts regarding sports prediction.

 

 

If you have further questions, feel free to leave a comment or contact me @Mo_Nbg.

One Reply to “Implement Model: Poisson distribution”

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s