Validate Model: GS & PPG match rating (part 3)

The first and the second part of this series explained some basic methods to optimise the regression models for the GS & PPG match rating. You have now a set of 3 different regression models (linear, polynomial and polynomial without outliers) for each predictive variable. These models now have to not only compete against each other, but also of course against the Bookie odds and the Poisson prediction model.

Continue reading “Validate Model: GS & PPG match rating (part 3)”

Validate Model: GS & PPG match rating (part 2)

The first part of this series took a look at the GS match rating model. The post described, how you are able to identify a non-linear relationship between the predictor variables and the outcome variable. The same methods will now be applied to the PPG match rating model, so that we are able to compare the two different polynomial regression models. On top, I want to show, how you are able to figure out, whether outliers in your data have an influence on your regression model.

Continue reading “Validate Model: GS & PPG match rating (part 2)” web scraper for upcoming fixtures




Squawka changed the design of their website. The website no longer uses HTML tables to list the fixtures for a specific league and also changed the corresponding URLs. That’s why the described web scraper does not work anylonger. As soon I found another data source for the upcoming fixtures, I will create a new blog.



For the implementation of the Poisson prediction model I needed a data source for the current fixtures. As a temporary solution I used a manual CSV file, which I updated and imported regularly. During my researches for new data sources, I found the website This website provides statistics and analysis based on Opta data. With this post, I will describe, how you can extract the current fixtures from this website and use them during the normal data processing, which replaces the manual CSV interface.

Continue reading “ web scraper for upcoming fixtures”

Validate Model: GS & PPG match rating (part 1)

In the last post I described, how the features for the GS & PPG match rating models are calculated. Based on these features I will now describe, how you build and optimise a linear regression model with R. The first part will describe the optimisation of the linear regression model for the GS match rating model in detail. The second part will cover the PPG match rating model. The third and final part will compare the prediction performance of the different models.

Continue reading “Validate Model: GS & PPG match rating (part 1)”

Analytical Architecture: TripleA DWH

As described in How to beat the bookie: Value Betting I want to use Value Betting to beat the bookie. To identify value, I have to be able to calculate the probability of a specific sports event (e.g. Home-Win for Team A) as accurately as possible. Therefor, I have to develop, test, simulate and process different predictive models.  As a DWH architect I know, that a good data architecture helps a lot to support such a developing process. That’s why I formed the concept of the TripleA DWH – the Advanced Agile Analytical Data Warehouse – a data architecture aimed to automate data science processes.

Continue reading “Analytical Architecture: TripleA DWH”