Migrating Exasol Community Edition

In one of my older posts I described the data architecture, I am using for all my examples. As the database I use the Exasol Community Edition. From time to time it is necessary to update your software to the current version because of new features. This post will describe, how to migrate a Exasol community edition to anther one. These steps can also be used, to migrate nearly every database to an Exasol.

Exasol 6.2 Features

In general I was really happy with Exasol 6.0. But the new version 6.2 offers some interesting new features:

  • Join-Pushdown for Virtual Schemas
  • Python 3 script language container
  • Hide inaccassible schemas
  • Full analytical window support

But the most useful one is definitely the full window support for analytical functions. This will help, to significantly simplify and shrink the SQL statements for the feature calculations.

Window functions

To calculate the features for a classic Dixon & Coles Poission model you have to create sliding time windows for the last e.g. 30 games. Window functions in SQL allow to access data before and after the current record. In Exasol 6.0 the window functions were only able to access 1 record before or after the current one. That’s why I had create the sliding windows on my own by using a self join and the ROW_NUMBER function to filter the window size.

Following statement creates the 30-day sliding average number of home goals scored and conceded for each team in the Bundesliga based on the scrapped Understat data.

with 
home_matches as
(
select
  season,
  division,
  match_date,
  home_team,
  home_goals goals_scored,
  away_goals goals_conceded
from
  SANDBOX.UNDERSTAT_MATCH_TEAM_STATS
where
  division = 'D1'
)
select
  season,
  division,
  home_team,
  match_date,
  avg(goals_scored) goals_scored_SMA30,
  avg(goals_conceded) goals_conceded_SMA30
from
  (
  select
    his.season,
    his.division,
    his.home_team,
    his.match_date,
    win.match_date win_match_date,
    win.goals_scored,
    win.goals_conceded,
    --analytical function to get number of records in for each window
    row_number() over (partition by his.home_team, his.match_date order by win.match_date desc) win_size
  from
    --self join to create window
    home_matches his
    join home_matches win
      on his.division = win.division and
        his.home_team = win.home_team and
        his.match_date > win.match_date
  )
where
  win_size <= 30
group by
  1,2,3,4
order by
  home_team,
  match_date desc;

With the full window support in Exasol 6.2 you are able to shorten the statement, by defining a analytical window starting 30 records before the current one. The new SQL is way easier to understand and has even a better performance as a self join is not needed.

select
  season,
  division,
  match_date,
  home_team,
  avg(home_goals) over (partition by home_team order by match_date rows between 30 preceding and current row exclude current row) goals_scored,
  avg(away_goals) over (partition by home_team order by match_date rows between 30 preceding and current row exclude current row) goals_conceded
from
  SANDBOX.UNDERSTAT_MATCH_TEAM_STATS
where
  division = 'D1'
order by
  home_team,
  match_date desc
;

Migrating to Exasol 6.2

For migrating data from one database to an Exasol database, Exasol provides different migration scripts at their GitHub repository. There are scripts for multiple different vendors.
To migrate the data from an older Exasol version to the new version 6.2 you to following following steps.

  1. Download the new Exasol 6.2 community edition and import the VM image into your used VM tool.
  2. Start both virtual machines in parallel, the old version and the new one. Both virtual machine should use a NAT network configuration, so that both a connected to the same virtual network.
  3. Connect to the new Exasol 6.2 virtual machine with the SQL tool of your choice and adapt the provided migration script. You have to enter the IP adress, SYS user & password for your old database. The created connection object is used to read the metadata of the migration source and generate scripts for creating schemas, tables and importing the data.

The generated script just migrates all schemas, tables and data. Other object types need to be migrated manually. The script does not do this Job, as e.g Views may contain dependencies, which is hard to script. So this need to be done manually. Based on my data architecture just two other object type need to be migrated: scripts and views.

For migrating the view objects, I created following script, which reads the metadata from the source object definitions.

select
  script_schema,
  replace(script_text,'CREATE','CREATE OR REPLACE') || ';'
from
  (import from exa at SECOND_EXASOL_DB statement
  'select
    *
  from
    "SYS"."EXA_ALL_SCRIPTS"
   ')
;

select
  view_schema,
  view_text || ';'
from
  (import from exa at SECOND_EXASOL_DB statement
  'select
    *
  from
    "SYS"."EXA_ALL_VIEWS"
   ')
;

You just again have to execute the statement and the created DDL scripts for all needed view objects.

The script objects are migrated in a similar way. The meta information is just stored in another system table.

select
  script_schema,
  replace(script_text,'CREATE','CREATE OR REPLACE') || ';'
from
  (import from exa at SECOND_EXASOL_DB statement
  'select
    *
  from
    "SYS"."EXA_ALL_SCRIPTS"
   ')
;


Checking the migration

Before droping, deleting or archiving your old database, you definitely should do a check, whether the new and old database now contain the same objects. Therefor I created following script, which compares the object catalog of both database, whether all objects of the old database also exist in the new database.

--check migrated objects
with
--data dicstionary of old database
old_db as
(
select
  *
from
  (import from exa at SECOND_EXASOL_DB statement
  'select
    object_type,
    object_name,
    root_name schema_name
  from
    "SYS"."EXA_ALL_OBJECTS"
  where
    object_type not in (''SCHEMA'')
   ')
),
--data dictionary of the new database
new_db as
(
select
  object_type,
  object_name,
  root_name schema_name
from
  "SYS"."EXA_ALL_OBJECTS"
where
  object_type not in ('SCHEMA')
)
--compare both dictionaries
select
  old_db.schema_name,
  old_db.object_type,
  old_db.object_name,
  case when new_db.object_name is null then false else true end is_migrated
from
  old_db
  left join new_db on
    old_db.schema_name = new_db.schema_name and
    old_db.object_type = new_db.object_type and
    old_db.object_name = new_db.object_name
;

When everything went fine, you should not be able to determine a object with the status IS_MIGRATED = False.

Conclusion

With the full analytical window support Exasol 6.2 offers a feature, which simplifies the calculation for all features for different predictive models. My future posts for feature calculations will therefor now only contain SQL statements using such window definitions.

If you have further questions, feel free to leave a comment or contact me @Mo_Nbg

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: