A data journey – market values (part 2)

In the last post I described, how I collected the market value data as the first step of my journey. The second step is – in my opinion – one of the most important ones. Get to know your data! Of course many predictive methods can be used as a black box. But that’s something I would not suggest. At least you should understand how your values are distributed. And it’s even better, when you build some kind of domain knowledge. To know your data offers you the possibility to shorten the training process of you predictive models. And visualizations always help to better understand your data. Continue reading “A data journey – market values (part 2)”

Define Variables: GS & PPG match rating

In the last post I described the predictive models, which will be explained in this series. Following the development process for predictive models, the next steps should handle the raw data supply for the predictive models. Fortunately football-data.co.uk already offers all data, which is needed for these models. So this post will explain, how you implement the features for the GS and PPG match rating models based on the existing Raw Data Vault model.

Continue reading “Define Variables: GS & PPG match rating”

Define variables: Brier score for market odds

While browsing the internet and looking for some new inspiration to build an own predictive model, I came upon a very interesting possible feature: the Brier score.

The Brier score is a possibility to measure the accuracy of a predictive model. It gets often used to measure the accuracy for weather forecasts. First I thought, I could use it as a kind of calibration feature for a predictive model. So that a predictive model recognizes, when it was too inaccurate in the past. But using it as a feature to detect teams, which can be predicted well by the bookies or which could cause unexpected results, seems to be a more promising approach. Therefor I want to explain in this post, how to calculate the Brier score based on the last betting odds for a specific team.

Continue reading “Define variables: Brier score for market odds”

Define variables: attack & defence strength

During my first investigations for predicting football scores I came across the predictive modelsĀ of Maher [1] and Dixon / Coles [2]. Maher modelled the number of goals a team scores during a match as two independent Poisson distributed variables, for the home team and the away team. He assumed that each team has an attacking strength and a defence strength. Dixon / Coles extended this model by adjusting some disadvantages of the Poisson distribution and by using a time dependent attack and defence strength. Both papers are the base of my first predictive model.

In this Post I want to describe, how the attack and defence strength are calculated and how you add this calculation to the existing Data Vault model. The predictive model itself will be explained in another post.

Continue reading “Define variables: attack & defence strength”