Exasol Python UDF web scraper for Bundesliga match day fixtures

The hardest part of sports analytics is getting data! Not for nothing there are companies, which earn their money just with sports data. But if you are not able or do not want to pay such amounts of money, you got just one possibility: scraping the data from the Web. In an older post, I described a R web scraper. As this one was no longer working, I needed a new one. What brings us to this post. This time I will describe, how to create a web scrapper for static HTML sites with Python and how you are able to implement such a web scrapper as a User Defined Function (UDF) in Exasol.

Continue reading “Exasol Python UDF web scraper for Bundesliga match day fixtures”

How To: Run TensorFlow in Exasol Community Edition

Attention:

As Exasol changed the way, how to build a customer Python3 docker container, this instructions no longer work. I have to update the single steps.

There is one big reason, why I have chosen Exasol as a database for my football analytics and predictions: Exasol is capable of executing Python and R code inside the database. Your are able to put your statistical calculations and predictive models to your data. The feature User Defined Functions (UDFs) provides the possibility to implement every logic which you normally code in Python or R. This is a really efficient way to extent plain SQL with some predictive functionality like the execution of TensorFlow models.

In this blog post I will explain, how you extend the Exasol community edition with all needed Python3 packages to execute Tensorflow models.

Continue reading “How To: Run TensorFlow in Exasol Community Edition”

Connecting to Exasol via Python

As mentioned in the last post, I am now going to use TensorFlow to build my first own predictive model. But before, there are several small steps, which need to be taken. At first I want to explain, how your able to read and write data via a Python script into Exasol. This is needed to read the different predictive variables and write back results of a prediction into the database when developing models.

Continue reading “Connecting to Exasol via Python”

How To: Install TensorFlow for Windows

I currently started to test machine learning algorithms to predict the results of football matches. I especially tried to use neural networks. But I soon realized, that the possibilities of R regarding neural networks are a little bit limited. So I want to take a look at TensorFlow. TensorFlow is a machine learning library provided by Google, which was already used for many different use-cases and proved its suitability.

As the installation process for TensorFlow was not self-explanatory, I thought, it would be a good idea to provide a small installation guide. I want to explain, how I installed TensorFlow and the Python GUI PyCharm.

Continue reading “How To: Install TensorFlow for Windows”

Prepare data: football-data.co.uk (part 1)

In the post Gather data: football-data.co.uk I described, how you can load CSV data into the Exasol database. As the data is now available at the Stage Layer in the database, I must now prepare the data and persist it at the Raw Data Layer, so that I can easily use it for building predictive models.

With part 1 of this post I want to explain, what Data Vault modeling is and how the Data Vault model for the data structure of football-data.co.uk looks like. With part 2 I will explain, how you load data into the developed Data Vault model.

Continue reading “Prepare data: football-data.co.uk (part 1)”

Analytical Architecture: TripleA DWH

As described in How to beat the bookie: Value Betting I want to use Value Betting to beat the bookie. To identify value, I have to be able to calculate the probability of a specific sports event (e.g. Home-Win for Team A) as accurately as possible. Therefore, I want to develop, test, simulate and process different predictive models.  As a DWH architect I know, that a good system architecture helps a lot to support such a developing process. That’s why I formed the concept of the TripleA DWH – the Advanced Agile Analytical Data Warehouse.

Continue reading “Analytical Architecture: TripleA DWH”