Scraping FBRef xG data with Python

In the realm of sports analytics, FBRef has recently unveiled an essential update: the availability of Expected Goals (xG) data for previously uncovered divisions. This pragmatic addition opens doors to nuanced insights, particularly valuable in the world of sports betting. In this blog post, I pragmatically explore these developments. Using Python and adept web scraping techniques, I dissect this fresh data.

Continue reading “Scraping FBRef xG data with Python”

xG data journey – the raise of M. Gladbach

After getting all this expected goals data, it’s of course most obvious to take a look at the insights such data can produce and in which way xG can be interpreted. I have decided to take a look at the current development of Borussia Moenchengladbach in the Bundesliga . Even if RB Leipzig took over now the first place, the development of Gladbach in comparison to the last seasons is impressive. And now I just want to know: Does xG data reveals the secret of Marco Rose?

Continue reading “xG data journey – the raise of M. Gladbach”

xG data journey – scrapping dynamic webpages

In the first part of this data journey, I took a look  at the general definition of expected goals (xG) and the usage of this metric. In the next step in the process of testing the predictive power of xG, I need to get some data. This part will focus on getting the team expected goals statistics. In one of the following parts, I will also take a look on getting the player expected goals statistics as this of course offers even deeper insights.

Continue reading “xG data journey – scrapping dynamic webpages”

A data journey – market values (part1)

When a rich club in Germany goes through a bad performance phase or loses an important match, we like to use the phrase “Geld schießt eben keine Tore”. What means more or less, that big money doesn’t ensure goals. But the overall acceptance is of course,  that richer clubs are expected to win more often as they got the money to buy the best players. This inspired me to start a data journey about market values in the big 5 European leagues: What do the market values tell about the development in the different leagues? How do teams perform in relation to the money they spent? Does the market value of a team has a predictive significance?

Continue reading “A data journey – market values (part1)”

Exasol Python UDF web scraper for Bundesliga match day fixtures

The hardest part of sports analytics is getting data! Not for nothing there are companies, which earn their money just with sports data. But if you are not able or do not want to pay such amounts of money, you got just one possibility: scraping the data from the Web. In an older post, I described a R web scraper. As this one was no longer working, I needed a new one. What brings us to this post. This time I will describe, how to create a web scrapper for static HTML sites with Python and how you are able to implement such a web scrapper as a User Defined Function (UDF) in Exasol.

Continue reading “Exasol Python UDF web scraper for Bundesliga match day fixtures”

Squawka.com web scraper for upcoming fixtures

 

 

Attention:

Squawka changed the design of their website. The website no longer uses HTML tables to list the fixtures for a specific league and also changed the corresponding URLs. That’s why the described web scraper does not work anylonger. As soon I found another data source for the upcoming fixtures, I will create a new blog.

 

 

For the implementation of the Poisson prediction model I needed a data source for the current fixtures. As a temporary solution I used a manual CSV file, which I updated and imported regularly. During my researches for new data sources, I found the website squawka.com. This website provides statistics and analysis based on Opta data. With this post, I will describe, how you can extract the current fixtures from this website and use them during the normal data processing, which replaces the manual CSV interface.

Continue reading “Squawka.com web scraper for upcoming fixtures”

Gather data: football-data.co.uk

When I started this project, my biggest problem was to find a source for historic football statistics and historic football odds. Fortunately, I found Joseph Buchdahl’s website football-data.co.uk. This website is just great! He offers CSV files for 22 football leagues and about 19 seasons. He updates the data mostly two times a week. So I used this data as the starting point for my analytical system.

Continue reading “Gather data: football-data.co.uk”