Python – Beat the Bookie

Oct 2, 2023

Scraping FBRef xG data with Python

In the realm of sports analytics, FBRef has recently unveiled an essential update: the availability of Expected Goals (xG) data for previously uncovered divisions. This pragmatic addition opens doors to nuanced insights, particularly valuable in the world of sports betting. In this blog post, I pragmatically explore these developments. Using Python and adept web scraping techniques, I dissect this fresh data.

Apr 18, 2023

Matching Team Names in Sports Betting Data: A Fuzzy Matching Approach

As a data engineer with a focus on predictive modeling for sports betting, one of the key challenges is matching team names from different data sources. In this blog post, we will explore how to use fuzzy matching to match team names from different sources and discuss an example implementation in Python. Additionally, we will introduce a new endpoint from BeatTheBookieDataService that provides a comprehensive matching of team names.

Feb 5, 2021

Automate your betting models with AWS

How does my typical betting weekend looks like, when I start ckecking, whether there are some interesting matches? I start my laptop, open the browser, start my Python program, start the database and after some minutes, I am able to start my data prcoessing, which collects all the data and calculates the predictions. That’s already great, but wouldn’t it be even better to have all predictions always already up-to-date? This blog will show you how to setup and run a small automated data pipeline in AWS, which extracts all stats from Understat.com.

Oct 25, 2020Dec 14, 2020

Why every data scientist should learn SQL

It’s been quite a long time since my last post for my blog. But that has been because of a specific reason: I participated at the 2nd DFB Hackathon, which consumed a huge amount of my freetime, which I normally spent creating some content for my blog. The Hackathon was again a great experience as all this deep data science stuff is still a challenge for me. But there’s again on big question on my side: Why are data scientist often just using Python (or R) and don’t know, how and when to use SQL.

Nov 24, 2019Dec 30, 2019

xG data journey – scrapping dynamic webpages

In the first part of this data journey, I took a look at the general definition of expected goals (xG) and the usage of this metric. In the next step in the process of testing the predictive power of xG, I need to get some data. This part will focus on getting the team expected goals statistics. In one of the following parts, I will also take a look on getting the player expected goals statistics as this of course offers even deeper insights.

Continue reading “xG data journey – scrapping dynamic webpages”

Mar 20, 2019Apr 22, 2019

A data journey – market values (part1)

When a rich club in Germany goes through a bad performance phase or loses an important match, we like to use the phrase “Geld schießt eben keine Tore”. What means more or less, that big money doesn’t ensure goals. But the overall acceptance is of course, that richer clubs are expected to win more often as they got the money to buy the best players. This inspired me to start a data journey about market values in the big 5 European leagues: What do the market values tell about the development in the different leagues? How do teams perform in relation to the money they spent? Does the market value of a team has a predictive significance?

Continue reading “A data journey – market values (part1)”

Jan 20, 2019

Exasol Python UDF web scraper for Bundesliga match day fixtures

The hardest part of sports analytics is getting data! Not for nothing there are companies, which earn their money just with sports data. But if you are not able or do not want to pay such amounts of money, you got just one possibility: scraping the data from the Web. In an older post, I described a R web scraper. As this one was no longer working, I needed a new one. What brings us to this post. This time I will describe, how to create a web scrapper for static HTML sites with Python and how you are able to implement such a web scrapper as a User Defined Function (UDF) in Exasol.

Continue reading “Exasol Python UDF web scraper for Bundesliga match day fixtures”

Jan 14, 2019Jan 26, 2020

How To: Run TensorFlow in Exasol Community Edition

There is one big reason, why I have chosen Exasol as a database for my football analytics and predictions: Exasol is capable of executing Python and R code inside the database. Your are able to put your statistical calculations and predictive models to your data. The feature User Defined Functions (UDFs) provides the possibility to implement every logic which you normally code in Python or R. This is a really efficient way to extent plain SQL with some predictive functionality like the execution of TensorFlow models.

In this blog post I will explain, how you extend the Exasol community edition with all needed Python3 packages to execute Tensorflow models. Additionally with the latest update I also added the packages and description needed for all my web scrapping scripts.

Continue reading “How To: Run TensorFlow in Exasol Community Edition”

Jul 20, 2018Jan 14, 2019

Team strength MLP (part 3)

Part one defined the basic architecture of the Team Strength MLP (multi layer perceptron). The training process and its monitoring via Tensorboard was explained in part two. Now it is time to take a look at the prediction of football matches. Primarily this consists of following steps:

Load the prediction data set
Re-build neural network architecture and load pre-trained weights
Execute prediction

The Bundesliga season 2017/18 will be the test case for this example. The season 2008 – 2016 were used to train the mode.

Continue reading “Team strength MLP (part 3)”

Jul 17, 2018Jan 14, 2019

Team strength MLP (part 2)

The first part of this series covered the definition of the network architecture for my Team Strength MLP. This neural network must now be trained. To explain and visualize the training process, Tensorflow offers the web frontend TensorBoard. This post will explain, how you use TensorBoard and what are some basic indicators for a well-trained model.

Continue reading “Team strength MLP (part 2)”