After I realized my available data is definitely not enough to beat the bookie, I decided to start a new data journey and take a look at some more advanced statistics. And what could be better suited as Expected Goals (xG). This statistic is used more and more to explain this specific luck / bad luck factor, you feel, when watching a football match. In the first part of this journey I will explain, what are xG and what they tell you about a football match.
What are Expected Goals (xG)?
“He has to score such chances!” – Statements you often hear during a football match. But how high is the real probability to score a specific chance. Do such 100% chances really exist? That’s the idea behind Expected Goals. Quantify the quality of a shot.
Sports data providers like Opta own data of hundreds of thousands of shots. And for each shot you know attributes like shot-location, body-part and defense pressure. Based on similar shots machine learning models are able to determine the average percentage a specific shot would have lead to a goal.
Here is a video published by Opta, which ilustrates xG with some nice examples:
Limitations of xG
In current discussion and in the media, xG is really hyped. As good xG as a advanced football statistic may sound, you also have to know and respect the specialties and limitations of xG.
xG is a model
When talking about xG you have to understand, that xG is a model and no defined KPI or statistic. It is not possible to compare xG values from different websites or statistic provides as each site / provider could use his own model and different shot attributes.
Following pictures show different xG values for the match Liverpool against Tottenham of 3 different statistic providers:
What’s the exact difference between these models? I don’t know. But it’s important to know, that there is a difference. So when using xG for predictions and changing the data source for your xG stats, you have to expect to rework your model based on the new data.
There are also some technical limitations, when tracking the specific information during the match. As one data provider told me, it’s still hard to get the height of the ball correct. So the xG value for headers may be imprecise. E.g. scoring chances where a cross was too high to get a good heading position, can have higher xG value then expected.
Interpretation of xG
Nevertheless, xG is a great statistic, which offers you many possibilities to get more insights about a match beside just goals and shots. xG provides you more information about the performance of teams as well as the performance of single players. After extracting xG data from a website, I will create some visualizations for the different interpretation possibilities.
At first xG is a great indicator for the performance of single players. As explained, hundreds of thousands of shots are used to determine a averaged expected goals value for different shot locations. As you can imagine, the players, who took these attempts, are not all the best players in the world. So you are able to identify the best players by taking a look, whether they overperform their expected goals value or not. If the number of goals scored is higher then the xG value, a player is able to score chances better than a average football player.
Teams can also overperform their expected goals value. That’s similar to the players performance. But comparing the number of goals to the final xG value of a match ist just one part. You can also compare the number of shots to the overall xG value. This comparison provides you some insights about the chance creation superiority of a team. A team, which on average has a higher xG value per shot, is able to create better scoring opportunities during a match.
Descriptive visualizations are always a fun to create. But for a blog call “Beat the bookie” the predictive power of a data is the most interesting part. Using or adding xG to a predictive model adds multiple implicit information:
- teams ability to create dangerous scoring opportunities
- teams ability to score opportunities
- average result of a match, if it would be played the same way hundred of times
- over / under performance phases
That’s the part I will focus at the end of this data journey. I will test xG predictive models and try to determine, how much better these models perform in comparison to simple shot & goal predictive models.
If you have further questions, feel free to leave a comment or contact me @Mo_Nbg