Prepare data: (part 1)

In the post Gather data: I described, how you can load CSV data into the Exasol database. As the data is now available at the Stage Layer in the database, I must now prepare the data and persist it at the Raw Data Layer, so that I can easily use it for building predictive models.

With part 1 of this post I want to explain, what Data Vault modeling is and how the Data Vault model for the data structure of looks like. With part 2 I will explain, how you load data into the developed Data Vault model.

Gather data:

When I started this project, my biggest problem was to find a source for historic football statistics and historic football odds. Fortunately, I found Joseph Buchdahl’s website This website is just great! He offers CSV files for 22 football leagues and about 19 seasons. He updates the data mostly two times a week. So I used this data as the starting point for my analytical system.

Analytical Architecture: TripleA DWH

As described in How to beat the bookie: Value Betting I want to use Value Betting to beat the bookie. To identify value, I have to be able to calculate the probability of a specific sports event (e.g. Home-Win for Team A) as accurately as possible. Therefore, I want to develop, test, simulate and process different predictive models.  As a DWH architect I know, that a good system architecture helps a lot to support such a developing process. That’s why I formed the concept of the TripleA DWH – the Advanced Agile Analytical Data Warehouse.

