A data journey – market values (part1)

When a rich club in Germany goes through a bad performance phase or loses an important match, we like to use the phrase “Geld schießt eben keine Tore”. What means more or less, that big money doesn’t ensure goals. But the overall acceptance is of course,  that richer clubs are expected to win more often as they got the money to buy the best players. This inspired me to start a data journey about market values in the big 5 European leagues: What do the market values tell about the development in the different leagues? How do teams perform in relation to the money they spent? Does the market value of a team has a predictive significance?

Continue reading “A data journey – market values (part1)”

Exasol Python UDF web scraper for Bundesliga match day fixtures

The hardest part of sports analytics is getting data! Not for nothing there are companies, which earn their money just with sports data. But if you are not able or do not want to pay such amounts of money, you got just one possibility: scraping the data from the Web. In an older post, I described a R web scraper. As this one was no longer working, I needed a new one. What brings us to this post. This time I will describe, how to create a web scrapper for static HTML sites with Python and how you are able to implement such a web scrapper as a User Defined Function (UDF) in Exasol.

Continue reading “Exasol Python UDF web scraper for Bundesliga match day fixtures”

Squawka.com web scraper for upcoming fixtures

 

 

Attention:

Squawka changed the design of their website. The website no longer uses HTML tables to list the fixtures for a specific league and also changed the corresponding URLs. That’s why the described web scraper does not work anylonger. As soon I found another data source for the upcoming fixtures, I will create a new blog.

 

 

For the implementation of the Poisson prediction model I needed a data source for the current fixtures. As a temporary solution I used a manual CSV file, which I updated and imported regularly. During my researches for new data sources, I found the website squawka.com. This website provides statistics and analysis based on Opta data. With this post, I will describe, how you can extract the current fixtures from this website and use them during the normal data processing, which replaces the manual CSV interface.

Continue reading “Squawka.com web scraper for upcoming fixtures”

Gather data: football-data.co.uk

When I started this project, my biggest problem was to find a source for historic football statistics and historic football odds. Fortunately, I found Joseph Buchdahl’s website football-data.co.uk. This website is just great! He offers CSV files for 22 football leagues and about 19 seasons. He updates the data mostly two times a week. So I used this data as the starting point for my analytical system.

Continue reading “Gather data: football-data.co.uk”