Squawka.com web scraper for upcoming fixtures

 

 

Attention:

Squawka changed the design of their website. The website no longer uses HTML tables to list the fixtures for a specific league and also changed the corresponding URLs. That’s why the described web scraper does not work anylonger. As soon I found another data source for the upcoming fixtures, I will create a new blog.

 

 

For the implementation of the Poisson prediction model I needed a data source for the current fixtures. As a temporary solution I used a manual CSV file, which I updated and imported regularly. During my researches for new data sources, I found the website squawka.com. This website provides statistics and analysis based on Opta data. With this post, I will describe, how you can extract the current fixtures from this website and use them during the normal data processing, which replaces the manual CSV interface.

Continue reading “Squawka.com web scraper for upcoming fixtures”

Validate Model: GS & PPG match rating (part 1)

In the last post I described, how the features for the GS & PPG match rating models are calculated. Based on these features I will now describe, how you build and optimise a linear regression model with R. The first part will describe the optimisation of the linear regression model for the GS match rating model in detail. The second part will cover the PPG match rating model. The third and final part will compare the prediction performance of the different models.

Continue reading “Validate Model: GS & PPG match rating (part 1)”

Define Variables: GS & PPG match rating

In the last post I described the predictive models, which will be explained in this series. Following the development process for predictive models, the next steps should handle the raw data supply for the predictive models. Fortunately football-data.co.uk already offers all data, which is needed for these models. So this post will explain, how you implement the features for the GS and PPG match rating models based on the existing Raw Data Vault model.

Continue reading “Define Variables: GS & PPG match rating”

Define Objectiv: GS & PPG match rating prediction

This post will be the start of a new series, where I explain, how to implement another predictive model at the TripleA DWH architecture. When starting developing predictive models with R, I was a little bit overstrained by the different plots provided by R, which can be used to analyse and optimize your predictive model. That’s why I wanted to learn and understand the whole optimizing process in R on base of a simple predictive model. Football-data.co.uk provides an explanation for a small rating system, which uses a linear regression to predict the probability for a home-win, draw or away win. I have chosen this linear regression model, as linear regression is a frequent used and easy to understand predictive method. With a linear regression you can investigate the relationship of the variable, which should be predicted, and one or more features.

Continue reading “Define Objectiv: GS & PPG match rating prediction”

How To: Kicktipp strategy simulation

I don’t know, how many of you know Kicktipp. Kicktipp is a very popular betting game in Germany, where everybody can start an own small betting community and can invite people to this community. This is very popular with friends and in companies especially during the big tournaments like the world cup. My company has also a yearly betting game for the German Bundesliga. The rules are very simple: You have to tip every match and you get 4 points for the correct result, 3 points for the correct goal difference and 2 points for the correct trend. In this post I will show you different betting strategies for Kicktipp and test, whether they are useful to win your personal Kicktipp betting game.

Continue reading “How To: Kicktipp strategy simulation”

Define variables: Brier score for market odds

While browsing the internet and looking for some new inspiration to build an own predictive model, I came upon a very interesting possible feature: the Brier score.

The Brier score is a possibility to measure the accuracy of a predictive model. It gets often used to measure the accuracy for weather forecasts. First I thought, I could use it as a kind of calibration feature for a predictive model. So that a predictive model recognizes, when it was too inaccurate in the past. But using it as a feature to detect teams, which can be predicted well by the bookies or which could cause unexpected results, seems to be a more promising approach. Therefor I want to explain in this post, how to calculate the Brier score based on the last betting odds for a specific team.

Continue reading “Define variables: Brier score for market odds”

Implement Model: Poisson distribution

In the last post the prototype of the Poisson prediction model has proven, that the optimised model is suitable to beat the bookie – at least for the German Bundesliga. The next step in the predictive model development process consists of implementing the model for forecasting the current fixtures. Regarding this model this part is very easy, as you need not to implement a trained model, just the prediction logic.

Continue reading “Implement Model: Poisson distribution”