In one of my older posts I described the data architecture, I am using for all my examples. As the database I use the Exasol Community Edition. From time to time it is necessary to update your software to the current version because of new features. This post will describe, how to migrate a Exasol community edition to anther one. These steps can also be used, to migrate nearly every database to an Exasol.
Exasol 6.2 Features
In general I was really happy with Exasol 6.0. But the new version 6.2 offers some interesting new features:
- Join-Pushdown for Virtual Schemas
- Python 3 script language container
- Hide inaccassible schemas
- Full analytical window support
But the most useful one is definitely the full window support for analytical functions. This will help, to significantly simplify and shrink the SQL statements for the feature calculations.
To calculate the features for a classic Dixon & Coles Poission model you have to create sliding time windows for the last e.g. 30 games. Window functions in SQL allow to access data before and after the current record. In Exasol 6.0 the window functions were only able to access 1 record before or after the current one. That’s why I had create the sliding windows on my own by using a self join and the ROW_NUMBER function to filter the window size.
Following statement creates the 30-day sliding average number of home goals scored and conceded for each team in the Bundesliga based on the scrapped Understat data.
with home_matches as ( select season, division, match_date, home_team, home_goals goals_scored, away_goals goals_conceded from SANDBOX.UNDERSTAT_MATCH_TEAM_STATS where division = 'D1' ) select season, division, home_team, match_date, avg(goals_scored) goals_scored_SMA30, avg(goals_conceded) goals_conceded_SMA30 from ( select his.season, his.division, his.home_team, his.match_date, win.match_date win_match_date, win.goals_scored, win.goals_conceded, --analytical function to get number of records in for each window row_number() over (partition by his.home_team, his.match_date order by win.match_date desc) win_size from --self join to create window home_matches his join home_matches win on his.division = win.division and his.home_team = win.home_team and his.match_date > win.match_date ) where win_size <= 30 group by 1,2,3,4 order by home_team, match_date desc;
With the full window support in Exasol 6.2 you are able to shorten the statement, by defining a analytical window starting 30 records before the current one. The new SQL is way easier to understand and has even a better performance as a self join is not needed.
select season, division, match_date, home_team, avg(home_goals) over (partition by home_team order by match_date rows between 30 preceding and current row exclude current row) goals_scored, avg(away_goals) over (partition by home_team order by match_date rows between 30 preceding and current row exclude current row) goals_conceded from SANDBOX.UNDERSTAT_MATCH_TEAM_STATS where division = 'D1' order by home_team, match_date desc ;
Migrating to Exasol 6.2
For migrating data from one database to an Exasol database, Exasol provides different migration scripts at their GitHub repository. There are scripts for multiple different vendors.
To migrate the data from an older Exasol version to the new version 6.2 you to following following steps.
- Download the new Exasol 6.2 community edition and import the VM image into your used VM tool.
- Start both virtual machines in parallel, the old version and the new one. Both virtual machine should use a NAT network configuration, so that both a connected to the same virtual network.
- Connect to the new Exasol 6.2 virtual machine with the SQL tool of your choice and adapt the provided migration script. You have to enter the IP adress, SYS user & password for your old database. The created connection object is used to read the metadata of the migration source and generate scripts for creating schemas, tables and importing the data.
The generated script just migrates all schemas, tables and data. Other object types need to be migrated manually. The script does not do this Job, as e.g Views may contain dependencies, which is hard to script. So this need to be done manually. Based on my data architecture just two other object type need to be migrated: scripts and views.
For migrating the view objects, I created following script, which reads the metadata from the source object definitions.
select script_schema, replace(script_text,'CREATE','CREATE OR REPLACE') || ';' from (import from exa at SECOND_EXASOL_DB statement 'select * from "SYS"."EXA_ALL_SCRIPTS" ') ; select view_schema, view_text || ';' from (import from exa at SECOND_EXASOL_DB statement 'select * from "SYS"."EXA_ALL_VIEWS" ') ;
You just again have to execute the statement and the created DDL scripts for all needed view objects.
The script objects are migrated in a similar way. The meta information is just stored in another system table.
select script_schema, replace(script_text,'CREATE','CREATE OR REPLACE') || ';' from (import from exa at SECOND_EXASOL_DB statement 'select * from "SYS"."EXA_ALL_SCRIPTS" ') ;
Checking the migration
Before droping, deleting or archiving your old database, you definitely should do a check, whether the new and old database now contain the same objects. Therefor I created following script, which compares the object catalog of both database, whether all objects of the old database also exist in the new database.
--check migrated objects with --data dicstionary of old database old_db as ( select * from (import from exa at SECOND_EXASOL_DB statement 'select object_type, object_name, root_name schema_name from "SYS"."EXA_ALL_OBJECTS" where object_type not in (''SCHEMA'') ') ), --data dictionary of the new database new_db as ( select object_type, object_name, root_name schema_name from "SYS"."EXA_ALL_OBJECTS" where object_type not in ('SCHEMA') ) --compare both dictionaries select old_db.schema_name, old_db.object_type, old_db.object_name, case when new_db.object_name is null then false else true end is_migrated from old_db left join new_db on old_db.schema_name = new_db.schema_name and old_db.object_type = new_db.object_type and old_db.object_name = new_db.object_name ;
When everything went fine, you should not be able to determine a object with the status IS_MIGRATED = False.
With the full analytical window support Exasol 6.2 offers a feature, which simplifies the calculation for all features for different predictive models. My future posts for feature calculations will therefor now only contain SQL statements using such window definitions.
If you have further questions, feel free to leave a comment or contact me @Mo_Nbg