Welcome season 2024/25 – taking a look at the new Ensemble model

As the new season kicks off, I’m thrilled to unveil the latest updates to my prediction models, designed to give you an even sharper edge in football betting. Whether you’re a seasoned pro or a passionate hobbyist, my goal is to help you make smarter, data-driven decisions. This season, I’ve made significant enhancements to the Ensemble model, introducing more sophisticated features and expanding market coverage. Let’s dive into the key improvements and see how they can boost your betting strategy for the upcoming matches.

Technical Improvements

What’s new on the technical side? Let’s dive into the most important changes first. The initial version of our Ensemble model, Version 1, relied on relatively few features. It combined predictions from three base models, a league encoding, and the market odds. In Version 2, I’ve significantly expanded this setup. Alongside the league encoding, I have introduced a temporal feature by adding matchday information. The number of base models has been increased to 14, ensuring the Ensemble model now captures both short-term and long-term information. All together, the 1×2 market Ensemble model now utilizes 47 input features.

Another major update was a reassessment of which algorithms deliver the best performance. Version 1 was based on a classic random forest approach. For Version 2, I tested various options: Random Forest, XGBoost, Support Vector Machine, and Logistic Regression. Surprisingly, the simple logistic regression delivered the best performance.

New Markets and Leagues

As mentioned in a previous blog post, I’ve added new leagues including MLS, the Argentine Primera División, the Belgian First Division A, and the French and Italian second divisions. For all these leagues, I now support not just the Home/Draw/Away market, but also the Over/Under 2.5 goals market. Unfortunately, due to a lack of historical odds data, we couldn’t expand into additional Over/Under markets for all new divisions.

Performance Overview

Now, let’s take a look at performance. Starting with the 1×2 market, the inclusion of new leagues has increased the total number of bets in the backtest from 21,000 to 32,000. While the overall yield is slightly lower at 4.7%, the sheer volume of bets means profits have still increased. With an average odds of 4.7, this corresponds to a p-value of 0.00001, indicating there’s only a 0.001% chance that this result is due to randomness—truly an impressive figure.

In a previous blog, I speculated whether I had created a “Draw Machine.” The new version seems to confirm this theory, as approximately two-thirds of the profits are generated by draw bets.

The new Over/Under 2.5 market, however, hasn’t performed as well. Over a set of 27,000 bets, the model shows a profit of only 1.17%. This suggests that the two-way market is more efficient than the three-way market. The p-value here is 0.03, which is close to statistical significance, but not quite there.

Why I Trust the New Model

Beyond the positive numbers, why do I have confidence in this model? The Over/Under 2.5 model shows a yield that might not sound great, especially considering the bookmakers’ margin, which is ideally around 1%-1.2%. What can I do to still place profitable bets? I focus on selecting only the safer bets, which indicate a higher value. Theoretically, this should increase my average profit.

You can observe this by looking at the yield distribution. Most bets have a value slightly below zero, showing a typical bell curve distribution. By selecting only bets with a higher value (e.g., using thresholds of 0.02 and 0.05), you reduce the number of bets but see an increase in yield. This is exactly the behavior I expect from a well-functioning model, and it’s what the model demonstrates.

Conclusion

I’m excited and hopeful as we head into the new season. To all professional and recreational bettors, I wish you the best of luck this season. Feel free to check out my BeatTheBookie app and the new Ensemble model—it’s free to use for 30 days. If you’re more into coding and want to get your hands dirty, take a look at my free data service. It offers a wealth of historical data for you to explore.

If you have further questions, feel free to leave a comment, contact me @Mo_Nbg or join the Discord Server.

Leave a comment