When I started this project, my biggest problem was to find a source for historic football statistics and historic football odds. Fortunately, I found Joseph Buchdahl’s website football-data.co.uk. This website is just great! He offers CSV files for 22 football leagues and about 19 seasons. He updates the data mostly two times a week. So I used this data as the starting point for my analytical system.
In this post I want to explain, how the data structure of these files look like and how you can import the data into an Exasol database. I want to achieve, that the data is available at the Stage Layer of my architecture (Analytical Architecture: AAA DWH) and I can proceed with the step “Prepare data” of the development process for predictive models (How To: Develop predictive models).
Data structure
The CSV files contain following information:
- Division
- Match date
- Home team / Away team
- Halftime result / fulltime result
- Statistics (shots, shots on target, corners, etc. )
- Betting odds
The whole structure is explained in a notes file at the website.
Importing CSV to Exasol
The CSV files, provided at the website, do not always contain each column, mentioned in the notes document. E.g the referee information is only available in the CSV files for Premier League matches.
To solve this problem, I imported each CSV file in a separate table with the corresponding data structure. After that, I created a stage table with a union of all available columns. So I could handle the differences in data structure.
The following code examples describe, how I imported the CSV file for German Bundesliga 2016/16.
Create stage table
At first the table, where the data should be stored must be created.
create table stage_layer.GER_D1_2016_17
(
Division varchar(20),
Match_Date varchar(20),
HomeTeam varchar(20),
AwayTeam varchar(20),
full_time_home_goals varchar(20),
full_time_away_goals varchar(20),
full_time_result varchar(20),
half_time_home_goals varchar(20),
half_time_away_goals varchar(20),
half_time_result varchar(20),
home_shots varchar(20),
away_shots varchar(20),
home_shots_target varchar(20),
away_shots_target varchar(20),
home_fouls varchar(20),
away_fouls varchar(20),
home_corners varchar(20),
away_corners varchar(20),
home_yellow varchar(20),
away_yellow varchar(20),
home_red varchar(20),
away_red varchar(20),
bet365_home_odds varchar(20),
bet365_draw_odds varchar(20),
bet365_away_odds varchar(20),
bet_win_home_odds varchar(20),
bet_win_draw_odds varchar(20),
bet_win_away_odds varchar(20),
interwetten_home_odds varchar(20),
interwetten_draw_odds varchar(20),
interwetten_away_odds varchar(20),
ladbrokes_home_odds varchar(20),
ladbrokes_draw_odds varchar(20),
ladbrokes_away_odds varchar(20),
pinacle_home_odds varchar(20),
pinacle_draw_odds varchar(20),
pinalce_away_odds varchar(20),
william_hill_home_odds varchar(20),
william_hill_draw_odds varchar(20),
william_hill_away_odds varchar(20),
vc_bet_home_odds varchar(20),
vc_bet_draw_odds varchar(20),
vc_bet_away_odds varchar(20),
betbrain_num_1X2 varchar(20),
betbrain_max_home_odds varchar(20),
betbrain_avg_home_odds varchar(20),
betbrain_max_draw_odds varchar(20),
betbrein_avg_draw_odds varchar(20),
betbrain_max_away_odds varchar(20),
betbrain_avg_away_odds varchar(20),
betbrain_num_ou varchar(20),
betbrain_max_o25 varchar(20),
betbrain_avg_o25 varchar(20),
betbrain_max_u25 varchar(20),
betbrain_avg_u25 varchar(20),
betbrain_num_asian_h varchar(20),
betbrain_size_asian_h varchar(20),
betbrain_max_asian_h_home varchar(20),
betbrain_avg_asian_h_home varchar(20),
betbrain_max_asian_h_away varchar(20),
betbrain_avg_asian_h_away varchar(20),
empty_1 varchar(20),
empty_2 varchar(20),
empty_3 varchar(20)
);
(Note: the conversion to the correct data types should be done later)
Import data
There are 2 possibilities to import the data into the database.
You can directly load the data from the website and import it into a table:
import into stage_layer.GER_D1_2016_17
from csv
at ‘www.football-data.co.uk/mmz4281/1617’
file ‘D1.csv’
column separator = ‘,’
row separator = ‘CRLF’
skip=1;
Or you download the CSV, save it in a local directory and import the CSV file:
import into stage_layer.GER_D1_2016_17
from local csv file [local_path]\GER_D1_2016_17.csv’
column separator = ‘,’
row separator = ‘CRLF’
skip=1;
In the next post, I will explain, how Data Vault modeling works and how you create the Data Vault model based on the data from football-data.co.uk.
If you have further questions, feel free to leave a comment or contact me @Mo_Nbg.
4 Replies to “Gather data: football-data.co.uk”