Query JSON in Db2

Dean Compher

25 February 2019

 

 

If your organization is using JSON more and more, you may want to start storing those documents in Db2 where you can query by particular fields in the documents or just get entire documents when needed.  For some time now, Db2 has allowed you store and query JSON documents in the database.  Since the JSON format is quite widely used to communicate between processes and organizations, it is frequently useful to store it.  In Db2, you insert the entire document into a column in a table and DB2 provides functions to query values, arrays or objects in the document.  There are also functions to generate JSON syntax from relational data.  In this article I discuss how to get started adding JSON data to a table and querying it, since I believe that is most useful. 

 

In this article I’ll provide some examples that will allow you to begin trying the JSON features using some of the more common functions.  This is only a sampling of what can be done, so I’ll provide links at the end so that you can see all of the available functions and read more about them. 

 

I’ll start by discussing a JSON document that comes from the solar power inverter on my house.  I query it with a REST API command (using cURL) and get back a JSON document.  All data is in a JSON object labeled “energy” (the key of a key/value pair) and the value has a number of other keys including “timeUnit”, “unit” and “values”.  The key, “values”, is an array of JSON objects that have the keys “date” and “value”.  The “{}” symbols denote a JSON object and “[]” denotes an array.

 

Example 1.  JSON Document from Solar Inverter

 

{"energy":{"timeUnit":"HOUR",

            "unit":"Wh",

            "measuredBy":"INVERTER",

            "values":[

                      {"date":"2019-02-04 08:00:00","value":211.0},

    {"date":"2019-02-04 09:00:00","value":352.0},

                      {"date":"2019-02-04 10:00:00","value":477.0},

                      {"date":"2019-02-04 11:00:00","value":557.0},

                      {"date":"2019-02-04 12:00:00","value":1038.0},

                      {"date":"2019-02-04 13:00:00","value":1311.0},

                      {"date":"2019-02-04 14:00:00","value":831.0}

                     ]}}

 

JSON functions have been available since Db2 10.1 fix pack 1.  However, the International Standards Organization recently released standards for JSON functions, and as of Db2 11.1.4.4 we have incorporated those standards into the database.  In all of my examples I will use the new ISO functions. 

 

If I have this table:

 

Example 2.  Table for JSON docuents

 

CREATE TABLE json.energy_json

          (SEQ INT NOT NULL GENERATED ALWAYS AS IDENTITY,

           READING_TS TIMESTAMP NOT NULL WITH DEFAULT,

           JSON_FIELD varbinary(2000))

 

The document in Example 1 can be inserted into the Example 2 table with this command:

 

insert into json.energy_json (json_field)

values

(JSON_TO_BSON(RTRIM('{"energy":{"timeUnit":"HOUR","unit":"Wh","measuredBy":"INVERTER","values":[{"date":"2019-02-04 08:00:00","value":211.0},{"date":"2019-02-04 09:00:00","value":352.0},{"date":"2019-02-04 10:00:00","value":477.0},{"date":"2019-02-04 11:00:00","value":557.0},{"date":"2019-02-04 12:00:00","value":1038.0},{"date":"2019-02-04 13:00:00","value":1311.0},{"date":"2019-02-04 14:00:00","value":831.0}]}}')))

 

There are a few things I would like to point out here. 

1.     I created the column for my JSON documents with the VARBINARY data type and use the BSON_TO_JSON function when inserting the document. 

2.     Columns can instead be created with a variety of character and binary data types including VARCHAR, BLOB, CLOB, VARBINARY and others.  See the “Overview” link in Table 1 below. 

3.     You will need to use BLOB and CLOB data types if your document exceeds the page size of the tablespace.

4.     Documents can be stored in the native JSON strings as a character data type or in BSON which is a binary representation of JSON.  BSON has some major performance benefits when selecting values within the document or operating on the document with the built-in functions.  It can also save some space if you don’t use adaptive compression in the database.  However, there is some overhead when converting documents to BSON and when converting back to JSON.  So if all you are doing is dropping the documents into the table and retrieving them as-is then storing them as a VARCHAR or CLOB may be a better choice. 

5.     I use the BSON_TO_JSON function to convert the document to a binary format to take advantage of the BSON format benefits. 

6.     In the insert example, I string the document all together because that’s typically how JSON documents arrive.  It is OK to use the indented format in the insert if you like. 

7.     I created the table in the schema, JSON, but you can use any schema name your like or skip the schema and create the table in the default schema. 

 

Frequently, your JSON document will be in a file so here is an example of using the INGEST command to insert the same document from a file called dailyenergy2019-02-04.json:

 

INGEST FROM FILE dailyenergy2019-02-04.json FORMAT POSITIONAL

   ($field1 POSITION(1:2000) char(2000))

   RESTART OFF

   INSERT INTO json.energy_json (json_field)

         values (JSON_TO_BSON(CAST($field1 AS VARCHAR(2000))))

 

Please note that the CAST function is required where shown in this example or you will get an error. 

 

Whether you INSERT or INGEST the data, here is a basic way to query the JSON field, showing part of the output:

 

select seq, BSON_TO_JSON(json_field) as my_doc from json.energy_json

 

SEQ   MY_DOC

---   ---------------------------------------------------------------

423   { "energy" : { "timeUnit" : "HOUR", "unit" : "Wh", "measuredBy" …

 

Notice that I used the BSON_TO_JSON function to convert the data back to a readable format.  If I had not done that, I would just have seen a string of weird characters.  Getting the entire string may or may not be all that useful, but it is probably more interesting to just select the fields that interest you.  To get one value, you can use the JSON_VALUE function.  Here is an example of retrieving values for the “unit” and “measuredBy” fields in the document in Example 1.

 

SELECT

   JSON_VALUE(json_field,

             '$.energy.unit' RETURNING CHAR(20)) AS unit,

        JSON_VALUE(json_field,

                   '$.energy.measuredBy' RETURNING CHAR(20)) AS device

FROM JSON.energy_json

 

 

UNIT   DEVICE

----   -------

Wh    INVERTER

 

In this select statemen I called the JSON_VALUE function twice to get two different values from the document.  Notice that the second value I provide the function in each case started with “$.energy.”  This is how you traverse the various levels of the hierarchy to indicate the specific fields you want.  Refer back to example 1 and notice that “unit” and “measuredBy” are both inside the “energy” label.  It is also very important to use the correct case.  For example, “measuredBy” and “measuredby” are not the same. 

 

My JSON example also contains an array.  Here, I show how to select the forth value of the two array elements, “date” and “value” that I call forth_date and forth_value respectively in the output.  In JSON the first index into an array is 0, so to get the forth element of an array, you specify 3 as shown here. 

 

SELECT

   JSON_VALUE(json_field,

       '$.energy.unit' RETURNING CHAR(20)) AS unit,       

   JSON_VALUE(json_field,

       '$.energy.measuredBy' RETURNING CHAR(20)) AS device,      

   JSON_VALUE(json_field,

       '$.energy.values[3].date' RETURNING TIMESTAMP) as forth_date,       

   JSON_VALUE(json_field,

       '$.energy.values[3].value' RETURNING Decimal(5,1)) as forth_value

FROM JSON.energy_json

 

UNIT   DEVICE     FORTH_DATE           FORTH_VALUE

----   --------   -------------------  -----------

Wh     INVERTER   2019-02-04 11:00:00  557.0

 

 

If you prefer to get your values from a table function then the JSON_TABLE function is for you.  This example provides similar output as the query above, except that I also include the fifth elements of the array:

 

SELECT  U.unit, U.device, U.forth_date,

        U.forth_value, U.fifth_date, U.fifth_value

  FROM json.energy_json AS E,

    JSON_TABLE

      (E.json_field, 'strict $'

       COLUMNS(unit        CHAR(2) PATH '$.energy.unit',

             device      CHAR(8) PATH '$.energy.measuredBy',

             forth_date  TIMESTAMP PATH '$.energy.values[3].date',

             forth_value DECIMAL(5,1) PATH '$.energy.values[3].value',

             fifth_date  TIMESTAMP PATH '$.energy.values[4].date',

             fifth_value DECIMAL(5,1) PATH '$.energy.values[4].value'

               )

       ERROR ON ERROR) AS U

 

UNIT DEVICE   FORTH_DATE          FORTH_VALUE    FIFTH_DATE       FIFTH_VALUE

---- -------- ------------------- -----------    ------------------- --------

Wh   INVERTER 2019-02-04 11:00:00 557.0          2019-02-04 12:00:00 1038.0

 

The “’strict $’” string must be the second parameter passed to the function.  That’s currently the only value allowed. 

 

If you don’t want to construct the input to these functions every time you want to query the JSON data, you can create views on the data in the documents.  To do that you just use the functions when creating the views and then you just run normal select statements to see the data.  For example, you could create a view based on the JSON_TABLE example above like this:

 

CREATE VIEW JSON.ENERGY_VIEW AS

SELECT  U.unit, U.device, U.forth_date,

        U.forth_value, U.fifth_date, U.fifth_value

  FROM json.energy_json AS E,

    JSON_TABLE

      (E.json_field, 'strict $'

       COLUMNS(unit        CHAR(2) PATH '$.energy.unit',

             device      CHAR(8) PATH '$.energy.measuredBy',

             forth_date  TIMESTAMP PATH '$.energy.values[3].date',

             forth_value DECIMAL(5,1) PATH '$.energy.values[3].value',

             fifth_date  TIMESTAMP PATH '$.energy.values[4].date',

             fifth_value DECIMAL(5,1) PATH '$.energy.values[4].value'

               )

       ERROR ON ERROR) AS U

 

You can then run the following query and get the results shown:

 

select * from JSON.ENERGY_VIEW

 

UNIT DEVICE   FORTH_DATE          FORTH_VALUE    FIFTH_DATE          FIFTH_VALUE

---- -------- ------------------- -----------    ------------------- --------

Wh   INVERTER 2019-02-04 11:00:00 557.0          2019-02-04 12:00:00 1038.0

 

 

As I noted earlier if you are going to do much with functions like JSON_TABLE and JSON_VALUE, whether directly or within views, then you will want to store your documents in BSON format since it performs better.  If you typically query your JSON data by certain fields, then adding indexes can make your queries even faster.  Here is an example of adding an index on the date of the first reading of the day:

 

CREATE INDEX JSON.IX_ENERGY_DATE ON json.ENERGY_JSON

   (JSON_VALUE(json_field, '$.energy.values[3].date' RETURNING DATE));

 

This index will make queries like either of the following run faster if you have a significant number of rows in the table:

 

select * from JSON.ENERGY_VIEW where DATE(forth_date) = '2019-02-04'

 

SELECT

   JSON_VALUE(json_field,

       '$.energy.unit' RETURNING CHAR(20)) AS unit,       

   JSON_VALUE(json_field,

       '$.energy.values[4].value' RETURNING Decimal(5,1)) as forth_value

FROM JSON.energy_json

WHERE JSON_VALUE(json_field,

       '$.energy.values[3].date' RETURNING DATE) = '2019-02-04'

 

 

Table 1.  Links for more information

 

Overview

Overview of storing JSON documents in Db2, data types you can use, additional examples and other very useful information. 

JSON_TO_BSON

Convert JSON strings into a BSON document format.

BSON_TO_JSON

Convert BSON formatted document into JSON strings.

JSON_VALUE

Extract an SQL scalar value from a JSON object.

JSON_TABLE

Creates relational output from a JSON object.

JSON_QUERY

Extract a JSON object from a JSON object.  A JSON object is a key value pair, and the value can be another JSON object or array.

JSON_EXISTS

Determines whether or not a value exists in a document.

JSON_ARRAY

Creates JSON array from input key value pairs.

JSON_OBJECT

Creates JSON object from input key value pairs.

 

In addition to using SQL to select and manipulate JSON data, there are other interfaces that are likely to be more familiar to developers who regularly use NOSQL interfaces.  These include the Java API and a MongoDB listener.  The latter allows those familiar with the MongoDB client to use it to connect to DB2 and store and retrieve JSON documents.  There is also a JSON command line shell you can use to administer objects used by the other interfaces.  See the Client access to JSON documents page for more information.  JSON documents inserted through these interfaces can be queried and updated by the new ISO functions described in this article.

 

George Baklarz and Paul Bird are writing a book on using JSON in Db2.  I’ve read a draft of it and it looks great.  I’ll update this article with a link to it when it becomes available. 

 

There are a number of reasons you may want to use the Db2 JSON features.  One of the first that comes to mind is the ability for developers to be productive and use only JSON interfaces they are used to, while at the same time providing SQL access to the same data.  One possible use for the relational interface is to do business intelligence reporting using tools that work best using SQL queries to access data.  Another is to do ELT processing where you load the raw JSON data into staging tables, and use the functions described here to copy the data into Db2 BLU column organized tables for blazingly fast BI queries. 

 

Another use case is where your organization gets data from the outside in JSON documents and you want to keep them for a while for auditing.  When there is a problem downstream with the use of that data it is helpful to be able review the particular documents to see if the problem was with what was sent or with the way you processed the data you received.  So, if you just drop all of the documents into a Db2 table as they arrive you can easily manage them and quickly find the ones of interest with simple queries as shown in this article.  You can then test with those particular documents and send back copies of the problem documents if you find that they do not conform to agreed-upon formats.  Further, if you timestamp the rows as I show in my examples here, you can delete old rows with simple delete queries when enough time has passed. 

 

***

 

I’m sure that there are quite a few other ways that using the JSON features of Db2 would be a benefit.  If you discover any creative uses of JSON in Db2 please post them on my Facebook Page or my db2Dean and Friends Community along with any other comments you may have.  

  

HOME | Search