Query Any DB from

One REST Endpoint

Dean Compher

28 September 2021

 

 

Have you ever wanted to query multiple databases without having to install any database drivers on your client?  Well, by deploying a single IBM Rest API End point somewhere in your enterprise, you can connect to many different vendors’ databases by using that endpoint and IBM Data Virtualization.  This gives you access to a number of different databases as well as S3 Object Storage and even CSV files.  IBM Data Virtualization now has a fully supported REST endpoint that gives you the ability to query all of its virtual tables and this means you have access to the tables from many sources.  This is useful for a number of reasons.  With it you don’t have to implement a different end point for each database.  It allows you to connect to many databases without installing any database drivers.  It is also a convenient way to get the results of an SQL query in JSON format.  There are other reasons too, but let’s talk about how to do this.

 

As you may already know, Cloud Pak for Data (CPD) is the platform that hosts Data Virtualization and many other services.  Starting in CPD 4.02, you can deploy this REST endpoint right in the Cloud Pak for Data platform containing Data Virtualization.  Alternative you can also deploy the Db2 REST Endpoint on any on-premises or cloud server that you like and make Data Virtualization one of the databases that can be queried.  Any workstation or server that has an application environment that allows you to make REST API calls will let you to connect to Data Virtualization and from there many sources. 

 

I will now show some examples of querying the Data Virtualization database using cRUL.  I have also created a Jupyter notebook that gives examples of doing the same queries using Python and added it to my Github Repo. 

 

The My virtualized data window in the Data Virtualization console shows the virtual tables that the Data Virtualization user can access.  Figure 1 shows some of the tables I can query.  Please note that I have two objects checked.  HOSPITAL_INFO_EDW is virtualized from a Db2 Warehouse on Cloud table called HOSPITAL_INFO and the HOSPITAL_HCAHPS_DATA_PG is virtualized from a PostgreSQL table called HOSPITAL_HCAHPS_DATA.    I will use them in the following examples. 

 

Figure 1.  My virtualized data window

 

Graphical user interface, application

Description automatically generated

 

Since I already have it, I’m going to use the endpoint I installed for my previous Db2 REST Endpoint article.  In that article I provide an example of how to get the needed token by sending the end point your database or data virtualization user id and password.  The token is used when issuing queries to the endpoint.  In this example I am using cURL to invoke the REST API to query the PostgreSQL table as shown.  My workstation doesn’t have the PostgrSQL driver installed and I didn’t need it because Data Virtualization takes care of that for me.  This call returns the data in a JSON document using the database column names as the keys in the key:value pairs. 

 

myquery="SELECT * FROM DV.HOSPITAL_HCAHPS_DATA_PG fetch first 2 rows only"

# $token2 created as shown in the Db2 Rest Endpoint article

curl --header "authorization: $token2" --header 'Content-Type: application/json' \

      -d '{"isQuery": true,"sqlStatement": "'"$myquery"'" ,"sync": true}' \

     http://192.168.0.14:50050/v1/services/execsql

 

Another useful thing about using Data Virtualization is that fact that you can join tables from multiple databases in one query.  Here is an example of joining tables from three databases with one API call using cURL in a bash script:

 

myquery= "select postgres.hcahps_measure_id \

              , db2edw.county_name as County \

              , sandbox.number_of_readmissions \

           from DV.HOSPITAL_HCAHPS_DATA_PG postgres \

              , DV.HOSPITAL_INFO_EDW db2edw \

              , DV.HOSPITAL_READMISSION_DB2S sandbox \

          where varchar(postgres.provider_id) = db2edw.provider_id \

            and db2edw.provider_id = sandbox.provider_number \

          fetch first 3 rows only"

 

curl --header "authorization: $token2" --header 'Content-Type: application/json' \

      -d '{"isQuery": true,"sqlStatement": "'"$myquery"'" ,"sync": true}' \

     http://192.168.0.14:50050/v1/services/execsql

 

Note that here I overrode the county column name with the “AS COUNTY” clause in the query.  The output is in JSON format as you can see here:

 

{'jobStatus': 4,

 'jobStatusDescription': 'Job is complete',

 'resultSet': [{'COUNTY': 'DISTRICT OF COLUMBIA',

   'HCAHPS_MEASURE_ID': 'H_COMP_4_LINEAR_SCORE',

   'NUMBER_OF_READMISSIONS': 61},

  {'COUNTY': 'DISTRICT OF COLUMBIA',

   'HCAHPS_MEASURE_ID': 'H_COMP_4_LINEAR_SCORE',

   'NUMBER_OF_READMISSIONS': 44},

  {'COUNTY': 'DISTRICT OF COLUMBIA',

   'HCAHPS_MEASURE_ID': 'H_COMP_4_LINEAR_SCORE',

   'NUMBER_OF_READMISSIONS': 40}],

 'rowCount': 3}

 

As you can see, each row is output as an element of an array.  This is a very handy way to get table data, or actually any query result, from any database into JSON format with the column names used for the keys in each key:value pair.

 

Using the API you can also call Data Virtualization procedures.  In this example, I’m using the describe table command inside the admin_cmd stored procedure.  Using this you can see the columns and data types of any virtual table. 

 

myquery="CALL sysproc.admin_cmd('describe table DV.HOSP_INFO_READMIT_VIEW ')"

 

curl --header "authorization: $token2" --header 'Content-Type: application/json' \

      -d '{"isQuery": true,"sqlStatement": "'"$myquery"'" ,"sync": true}' \

     http://192.168.0.14:50050/v1/services/execsql

 

Feel free to copy and paste these examples, but be careful of the various single and double quotes.  Since I pasted them from a ssh window into a document that I converted to html, the quotes may get changed around.  I recommend retyping them. 

 

As I noted in the opening paragraph you can even virtualize files on remote servers including .csv and .xls (MS Excel) files.  This is done by running the Data Virtualization Remote Connector.   When everything is configured this allows you to view files or worksheets as virtual tables in Data Virtualization just as you see tables in other databases.  Data Virtualization presents the virtual tables the same way regardless of source, so the REST API has access to files that way.  See all of the supported data sources by viewing the “Data Virtualization” column in the grid. 

 

I use cURL in the examples I provide in this article because they are a succinct way of showing the API calls.  However, you can call the API using any runtime that allows RESTful API calls including Python or Java.  Again you don’t need any database drivers on the workstation or server issuing the API calls.  I have posted these same examples in a notebook in my Github Repo

 

You can also run Data Virtualization administration commands through a REST API, but not the one you use for querying data.  With the administration API you can do things like creating or dropping virtual tables, managing users and other functions. 

 

***

 

In this article I wanted to show you how to query tables from multiple databases without deploying any drivers on your clients.  I hope you found it useful.  You may have some other good use cases for doing this, so please post those ideas to the db2Dean Facebook Page.

 

HOME | Search