As mentioned in the last post, I am now going to use TensorFlow to build my first own predictive model. But before, there are several small steps, which need to be taken. At first I want to explain, how your able to read and write data via a Python script into Exasol. This is needed to read the different predictive variables and write back results of a prediction into the database when developing models.
Possibilities to connect to Exasol
There are several possibilities to read data from Exasol and write it back with the help of Python. Based on your requirements and circumstances you just have to select the correct package.
Exasol Python package
Exasol offers a Python package, which acts as a wrapper for the PyODBC package. You are able to read data from the database directly into a Pandas data frame and also write data from a data frame into the database. This package only works with Python 2. You can find this package at the Exasol GitHub
Exasol websocket API
For Python 3 Exasol offers a websocket API package. Unfortunately you can not use it directly with Pandas data frames and it is known to have a bad performance. It can also be found at the Exasol GitHub.
Badoo pyexasol package
The second faster option for Python 3 is the Pyexaol package of Badoo. It also offers the possibility to write into or out of a Pandas data frame, what is really flexible. And it is available for Python 3. Additionally the developer of this package offers a good manual with many examples, how to use it. So this is the option I did choose.
Install pyexasol package
Installing the package alone is not enough. There are several required and optional dependencies. Following core dependencies are listed at the manual:
- Exasol >= 6
- Python >= 3.6
- websocket client >= 0.47
- rsa
I also installed pandas as an optional dependency as I like to use pandas data frames. After you started your tensorflow anaconda environment, you just have to execute following command:
(tensorflow)C:> <code>pip install pyexasol[pandas,encrypt]</code>
If you want to install pyexasol with all optional dependencies, you have to execute following command:
(tensorflow)C:> <code>pip install pyexasol[pandas,encrypt]</code>
Execute SQL statements
I want to describe the basic functionalities of this package with a small example. Thereby I create an empty copy of the staging table for the current fixtures. After that the original data will be copied by using a pandas data frame.
At first you have to create a connection to your Exasol database:
Con = pyexasol.connect(dsn=[IP-adress]:[port], user=[user], password = [password], schema = [schema], compression=True)
Executing a single query can be done with the execute() function of the connection object. This function returns an ExaStatement object, which could also be used to iterate over a result set of a SQL statement.
Con.execute("Create table SANDBOX.CUR_FIXTURES as select * from STAGE.SQUAWKA_CUR_FIXTURES where 1=0")
Read data from Exasol
To be able to easily export data into a pandas data frame was my main requirement, when searching for a method to read data from an Exasol database. The pyexasol package offers the Export_to_pandas() function, which returns a pandas data frame with the result set of the passed SQL statement.
data_frame = Con.export_to_pandas('SELECT * FROM STAGE.SQUAWKA_CUR_FIXTURES') print(data_frame.head())
Write data to Exasol
Writing data back into Exasol is as easy as reading data. You just have to use the import_to_pandas() function and provide the data frame and the target table. You have to take care, that the target table is located in the schema, mentioned as a parameter during connecting to the Exasol.
Con.import_from_pandas(data_frame,'CUR_FIXTURES')
After all, you should close the existing connection.
Con.close
There are several different possibilities to access a database throw Python. But writing data directly into a pandas data frame is really in my interest, as the data frames are easy to handle and can be used as an input for a TensorFlow model.
The complete Python script for this example can again be found at GitHub:
If you have further questions, feel free to leave a comment or contact me @Mo_Nbg.
Additionally:
Big Kudos to the developer of the Pyexasol package. He not only provides a really simple way to access the Exasol database, he also really fast helped to fix an error regarding the execution on a Windows environment.
hi
thanks for all detail.
I check I have install all required dependencies but then too
I am getting error when I try to connect Exasol via pyexasol.
ExaConnectionFailedError:
(
message => Could not connect to Exasol: [WinError 10042] An unknown, invalid, or unsupported option or level was specified in a getsockopt or setsockopt call
dsn => 10.42.234.11..13:8563
user => i
schema =>
)
can you please help.
Thanks in advance
LikeLiked by 2 people
Did you already try to use just one IP of your cluster? I don’t know, whether the connect function can handle a IP range.
LikeLike
Thanks for reply.
my team mates are using IP range so yeah it works.
My issue is resolved for now.
I tired running script via Sublime text then it worked, its directly downloaded over my system. Whereas earlier I was trying to execute via Anaconda/Sypder -( I still have to check may be installation break something there)
but for now I am good
please do keep posting all articles.
Thanks 🙂
LikeLike