Inside the BeatTheBookie App – Predicting football matches with an Ensemble model

Not only is it the core feature of the BeatTheBookie app, but it’s also the primary reason why I started developing the app. However, why is this the case, and how does this type of predictive model work for predicting the outcomes of football matches? Let’s delve into it.

An ensemble model refers to a technique in machine learning wherein multiple individual models are amalgamated to formulate predictions or decisions. The concept behind ensemble modeling is grounded in the belief that the collective intelligence of numerous models can often surpass the performance of a solitary model, resulting in more precise and resilient predictions.

In my scenario, I combined a selection of my predictive models, each of which takes into consideration various facets of a football team’s performance—ranging from short-term to long-term performance, as well as performance based on market values. Moreover, I aimed to incorporate additional contextual factors pertaining to the football match being predicted, commencing with league encoding. This incorporation is intended to aid the model in discerning specific differentiators among various leagues. However, there exists a wealth of information that isn’t easily incorporable into a model. Fortunately, all this information is encompassed within the realm of market odds, which should consequently serve as an input for the ensemble model.

However, a single challenge arises when incorporating market odds into predictions. Utilizing historical odds for training and testing such a model poses minimal difficulty, as the approach remains relatively consistent. The true dilemma surfaces when attempting to integrate odds during the prediction phase. The issue lies in the fact that individuals may source odds from various bookmakers. Platforms like Mollybet offer an extensive array of diverse odds all at once. This situation prompted the conception of an app that simplifies this process. This app enables me to input the optimal odds, which are then employed as a model feature. Moreover, the app promptly highlights the markets that hold value based on the generated prediction.

Ensemble model variations

During the development of the Ensemble model for the BeatTheBookie App, I conducted tests involving various combinations and algorithms. Additionally, I explored a novel approach to model training. In prior iterations, I consistently trained the model using the entire pool of available historical data. However, I also experimented with models that were exclusively trained on data from the last two years. The rationale behind this approach is to enable the new model to prioritize recent trends while discarding less relevant past data. This adjustment is based on the belief that such a focus could enhance the model’s predictive accuracy by capturing more current dynamics.

Model criteria	Variations
Source model combinations	– Zero-Inflated-Poisson, ML Poisson, ML Market value [v1-v3] – Zero-Inflated-Poisson, Vanilla Poisson XG10, ML Market value [v1-v3]
Training data set	– complete history – 2 years of history
Model parameter	different values for sample-leaf size, as this influenced the model accuracy the most

Training data

As a result, a total of 24 distinct ensemble models were generated. These models underwent training and testing using data spanning the years 2018 to 2023 for the following football leagues: 2. Bundesliga, Bundesliga, Championship, Eredivisie, La Liga, La Liga 2, Liga Portugal, Ligue 1, Premier League, and Serie A.

Ensemble model performance

For each betting simulation, I employed a flat-stake betting strategy of 1 unit. In this approach, every bet indicating value was selected, and the bets were placed against fair Bet365 odds, thereby eliminating any margin. The ensemble models, in general, exhibited enhanced performance, reaffirming the trend I observed during my initial tests with the models. Notably, my established Poisson models found themselves toward the lower end of the performance spectrum. Meanwhile, the short-term xG Poisson model consistently demonstrated a solid performance, yielding an average profit of 4.89%. On the other hand, the ML Poisson model and the Ensemble models that incorporate it have a more specific scope, being applicable solely to the Big5 leagues in Europe.

Narrowing our focus to the top 5 leagues reveals a distinct scenario. Within this context, it becomes evident that the Ensemble models that exclude the ML Poisson model exhibit the most noteworthy performance.

Average betting profit (Big 5, 2018-2023)

The Ensemble models not only exhibit divergent performance when compared to individual models but also demonstrate a distinct betting behavior throughout the backtesting process. Notably, predicting a draw has historically been a vulnerability when employing Poisson models for football predictions. However, this dynamic appears to differ when it comes to the Ensemble models. These models notably signal value for draw bets on a much more frequent basis.

Model selection

When selecting a model for the BeatTheBookieApp, I primarily prioritized two key aspects: overall profit and profit variance. A noteworthy challenge with my Poisson models lies in the variability of profits across different years. A lower variance translates to a more consistent profit trend over time. Examining the season-specific profits for the Poisson models reveals an approximately 10% difference between different years.

Avg beeting profit per season ML Poisson model

Undoubtedly, the Ensemble model that stands out as the most appealing option based on these two criteria demonstrates a notably reduced variance.

Avg beeting profit per season Ensemble model

Conclusions

The Ensemble models exhibit significant potential not only as profitable betting models but also for future enhancements. All of my foundational models serve as valuable input features for these models. Consequently, I can already outline my forthcoming steps, which involve exploring additional model combinations, further elevating profits, and continuing efforts to reduce variance even more effectively.

If you have further questions, feel free to leave a comment or contact me @Mo_Nbg.

Ensemble model variations

Training data

Ensemble model performance

Model selection

Conclusions

Share this:

Related

Leave a comment Cancel reply