As Exasol changed the way, how to build a customer Python3 docker container, this instructions no longer work. I have to update the single steps.
There is one big reason, why I have chosen Exasol as a database for my football analytics and predictions: Exasol is capable of executing Python and R code inside the database. Your are able to put your statistical calculations and predictive models to your data. The feature User Defined Functions (UDFs) provides the possibility to implement every logic which you normally code in Python or R. This is a really efficient way to extent plain SQL with some predictive functionality like the execution of TensorFlow models.
In this blog post I will explain, how you extend the Exasol community edition with all needed Python3 packages to execute Tensorflow models.
The Exasol Community Edition is running as VM image. Internally Exasol uses docker to execute the different programming languages. But working with windows, these both technologies does not really work together. That’s why there have to be done some preparations before you can start creating a TensorFlow container.
Create Ubuntu VM
As you are not able to run Docker and VMWare in parallel on Windows, I decided to create a separate Ubuntu VM for all Docker task. Of course you can also use any other free Linux distribution.
The Ubuntu Serve ISO is available under following link:
While creating a new virtual machine you just have to select the downloaded ISO to use Ubuntu as an OS.
During the installation of Ubuntu you have to define the user & password, which you use later to login to the VM.
To simplify the work with the different VMs I also used following tools, which I can just recommend everybody.
Putty – Putty is a slim SSH client, to connect and work with the command line of the Ubuntu VM. That’s just way more enjoyable as using the small VMWare window.
WinSCP – WinSCP is a open source SFTP file transfer client. That’s the easiest way to exchange files with the Ubuntu VM without thinking about shared folders or curl commands.
To get both tools work, you have also to install a SSH server on your Ubuntu VM:
sudo apt-get install openssh-server
In the next step, you have to install Docker at the Ubuntu VM. This is needed to create and handle the language docker container of Exasol. Fortunatly I found a good guide, how to install Docker on Ubuntu:
Exasol offers a GitHub repository with different solutions. There can also be found some scripts, which fully automatically create language containers.
#install git hub sudo apt-get install git #clone exasol repository git clone <a href="https://github.com/exasol/script-languages">https://github.com/exasol/script-languages</a>
Create a Python 3 container
The Exasol GitHub repository “script languages” contains a guide and all scripts to create Docker language container for different flavors. A flavor is a specific combination of programming language and packages. If you need multiple programming languages, you do not have to create a container for each language. You just define your own flavor with all needed information.
In my case I have chosen the falvor “python3-ds-EXASOL-6.0.0“. This one is suitable for Exasol Version 6.0 and already contains Python 3 with all TensorFlow Packages. I just added the web scraper packages “Beautifulsoup” and “html5lib” by adding then line
RUN pip install beautifulsoup4 html5lib
to the flavor base file.
After that adaption, the docker container can be build and exported.
./build --flavor=python3-ds-EXASOL-6.0.0 ./export --flavor=python3-ds-EXASOL-6.0.0
As the result you receive a standalone archive “python3-ds-EXASOL-6.0.0.tar.gz, which needs to be copied to the BucketFS, the internal file system of your Exasol database. The BucketFS explorer is the easiest way, to manage different buckets and copy files.
Test TensorFlow implementation
TensorFlow is now implemented your Exasol database. So it’s time to test, whether everything is running fine.
First you must populate the new TensorFlow container as a new script language:
ALTER SESSION SET SCRIPT_LANGUAGES = 'PYTHON=builtin_python R=builtin_r JAVA=builtin_java PYTHON3=localzmq+protobuf:///bucketfsname/bucketname/python3-ds-EXASOL-6.0.0?lang=python#buckets/bucketfsname/bucketname/python3-ds-EXASOL-6.0.0/exaudf/exaudfclient';
For testing TensorFlow I created a small UDF, which just reads a parameter und returns it by executing the typical “Hello World” example for TensoFlow.
create or replace PYTHON3 scalar script sandbox.test_tensorflow (p_test varchar(100)) returns varchar(100) as import tensorflow as tf def run(ctx): v_const = tf.constant(str(ctx.p_test)) v_sess = tf.Session() v_return = v_sess.run(v_const) return v_return.decode() ;
The created UDF can easily used inside every SELECT statement. That makes such UDFs so powerfull. Instead of just a basic “HelloWorld” UDF, you are able to load a trained model and directly use it on your data.
select sandbox.test_tensorflow('Hello bla TensorFlow!') from dual;
The Result seems to be correct:
Such UDFs can of course not only used for predictive models. You are able to execute every R and Python package, which is available. In a next post I will explain a beautifulsoap web scraper UDF, which reads data from the German football news site Transfermarkt.de.
If you have further questions, feel free to leave a comment or contact me @Mo_Nbg.