External Tables

Dean Compher

27 November 2019

 

 

Db2 11.5 introduces a great new feature called external tables.  The feature allows you to make data in a file appear to be a table in your database, letting you query it like any other table.  It also allows you to copy data into a file if you insert into the external table.  These files can appear as any other table in your database or can be referenced like a table in a query without actually issuing a create table command against the file.  Querying the data in files has all sorts of uses.  Examples include a fast way to examine data in a file using SQL, easily loading all or parts of a file into a regular table, and exporting columns of tables to a file.  Db2 only checks the validity of data at the time when you query the data and not when you actually crate the table.  Rows with bad data are ignored, making it easy to access just the good rows.  The files can be local on your database server or in certain types of cloud storage like S3 storage.  This is great if your data is already in a supported cloud storage because you don’t have to download the file to start using the data.  You just reference the cloud storage object in the CREATE TABLE command or query. 

 

The files under your external tables can be local on the Db2 server, on a different server, or in the cloud.  In this article I’ll just describe the local and cloud options.  Cloud files can be in S3, SWIFT and AZURE storage types.  The IBM Cloud supports both S3 and SWIFT.  

 

For now I’ll discuss the local files, but will talk about cloud later.  Db2 will only use local files in directories specified in the EXTBL_LOCATION database configuration parameter.  You can just place your files in that directory or any directory below it, or you can specify a list of directories and use any of them.  Again, you only need to specify a top-level directory.  Any subdirectories under the directories you specify can be created and used any time.  Here is the command I used to specify two directories:

 

update db cfg using EXTBL_LOCATION '/db2fs/external;/database/config/db2inst1'

 

Once you’ve opened access to the directories you want, you can now start creating external tables on files in those directories.  Those files can be delimited or have fixed length fields.  In my examples I will use the gosales.order_header table that is used in many Db2 documents and is available for download.  In my case I extracted several rows from the table in a delimited format and then used that file for my purposes.

 

Example 1:  Simple external table

 

CREATE EXTERNAL TABLE gosales.ext_order_header LIKE gosales.order_header

       USING (dataobject '/db2fs/external/oh2.del' maxerrors 1000

              DELIMITER ',')

 

Notice the following about this example:

·      Used the EXTERNAL keyword in the CREATE table command

·      Used the LIKE key word to define the columns and data types of my external table using the values of an existing table.  I could have specified the columns and data types if I didn’t already have a table formatted like my file.

·      Specified the full path to my file called “oh2.del” that I’m using for my table. 

·      Specified 1000 for the maximum number of errors that can be encountered before a query operation quits.  The default is quitting on one error.  Make this number really big if you want to ignore all bad rows.

·      My DELIMITER is a comma.  This is a comma delimited file.  You can specify any character you like.

·      There are many other parameters you can specify to control the behavior of the external table

·      Note that not all data types are currently supported.  The link in the preceding bullet enumerates the ones allowed.  Don’t let this stop you though. For example, the XML data type is not supported, but you could have an external table with a column containing XML, but the column would need to be defined as a CLOB, VARCHAR, etc. 

·      You can control access to the table using grants and revokes, roles, etc., just as you can with any other table.

 

The create external table command will work even if none of the records in the file meets the data types or lengths of the columns you defined in the create table command.  The validity of the data is checked each time you query the data.  Two files are written each time the data is queried.  A .log file shows the parameters used to create the external table along with statistics about the query, including the number of rows read, number of bad records, and the issue with each bad record.  A .bad file is also written with a copy of the invalid records.  This is pretty nice, since you can then easily see which rows are bad, edit them and then replace them in the original file. 

 

The files are created in one of the directories specified in the DB CFG EXTBL_LOCATION parameter. Here is what the file names looked like after I queried the ext_order_header external table created in Example 1 above.

 

SAMPLE.GOSALES.EXT_ORDER_HEADER.oh2.del.00000056.0000003.bad

SAMPLE.GOSALES.EXT_ORDER_HEADER.oh2.del.00000056.0000003.log

 

As I noted above, these files get created every time you query the table, so be careful about cleaning them up, especially if you have a lot of bad records.

 

The file I used in Example 1 had 100 records, of which 11 had bad data.  If there are bad rows in the file, but fewer than the maximum bad rows you specify when creating the table then applications will only see the good rows when you query the data.  To illustrate I ran the following query to count the rows in the table.  The number of rows returned was 89. 

 

select count(*) from gosales.ext_order_header

 

However, I also got a warning code “SQL5108W” and the warning message indicated that there are 11 bad records in the file.  The summary statistics in the log file looked like this after running the above query.

 

Statistics

  number of records read:      100

  number of bytes read:        10896

  number of records skipped    0

  number of bad records        11

  -------------------------------------------------

  number of records loaded:    89

 

The log file also described the issue for each bad record.  I then recreated that table specifying a value of 7 for maxerrors to see what would happen when I exceeded the number of bad rows.   The create table worked fine, but when I ran the query to count rows, I got the following error code:  “SQL20570N” with no data returned. 

 

Using external tables allows you to query data in a file quickly since you don’t have to wait to load the file to start querying it.  Just create an external table pointing the file and you can start querying right away.  However, for extremely large files the queries may be slow because the files are not optimized as they would be if you actually loaded them into regular Db2 tables.  You should note that you can pretty easily load the data into a regular table from an external table.  Maybe you want this for performance or maybe you just like this method better than setting up load, import or ingest jobs.  In any case, here is one easy way to put the data into another table, possibly a column organized table for analytics queries.

 

Example 2.  LOAD from CURSOR into a regular table. 

 

declare c1 cursor for select * from gosales.ext_order_header

                         order by order_date

;

load from c1 of cursor messages /db2fs/external/loadcsr.msg

     replace resetdictionary into gosales.order_header

;

 

One nice thing about this is that in the cursor you can do lots of manipulation, call Db2 functions, exclude columns etc., to make the query look like the table being loaded.  Also if you are using the same data types in both tables, then the load should be clean, since bad records will be excluded by the query and not forwarded to the load utility.  In this example I just sorted the rows by order_date, but you can use very involved SQL if you like.  Further, instead of using load you could use insert with select and other commands if you prefer.  If you need to run the load from cursor from a client, you will use the “admin_cmd” procedure, but the file still needs to be on the database server or in a supported cloud store. 

 

Another thing I like about external table is the easy access to JSON data.  I like the external table interface for accessing JSON better than the insert/load/ingest utilities, and the data is available immediately after creating the external table instead of having to wait until a load is completed.  Here is an example of creating an external table and querying it as relational data.

 

Example 3.  External Table for JSON and Relational Query

 

CREATE EXTERNAL TABLE json.energy_json_ext

           (JSON_FIELD varchar(2000))

       USING (dataobject '/db2fs/external/july.json'

       maxerrors 1000 DELIMITER '|')

;

 

Notice the following about this example:

·      JSON documents in a file are not delimited, so you need to specify a character as the delimiter that is NOT in the file.  In this case the pipe “|” symbol is not in my file.  This forces the entire record to go into one column.  Each record (document) goes into a different row, though.

·      Notice that I created the table with just one column, because we are forcing the feature to consider the entire record as a column. 

·      You have to use one of the character data types for column such as VARCHAR or CLOB.  You can NOT use binary types like VARBINARY because the data is character.

 

Now you can use SQL to do just about anything with the JSON documents, including a load from cursor as shown above or query the data in the documents with SQL as I show in my Query JSON in Db2 article.  This is a great way to quickly query JSON data using any application like a report writer that uses SQL, although for some tools you may need to build a view on the JSON table as shown in my article. 

 

As I mentioned earlier, the files for the external tables can be on certain types of cloud storage.  This is handy for a number of reasons.  Many DBA’s don’t have access to place files on database servers, so this is a great workaround, but your database server does need to have access to the internet unless you use local cloud storage.  Another reason is convenience when your files are already on the cloud.  In that case you can just point your external tables at those files and have access to the data.   The CREATE EXTERNAL TABLE page in the knowledge center has lots of good information on how to create your tables using these various types of cloud storage.  I’ll show an example that is very similar to my Example 1 above, except that my file is on cloud storage instead of local. 

 

Example 4.  External Table on Cloud Storage

 

CREATE EXTERNAL TABLE gosales.ext_order_header_cloud LIKE gosales.order_header

   USING (dataobject 'oh2.del'

          s3('s3.us-south.cloud-object-storage.appdomain.cloud',

             'ca6086c5ac504f9eb6c8d45874b080z0',

             'a77c827e9f3767a8984a1a8e80db85a55c76db71a2936508',

             'cc-dean-tutorial')

          maxerrors 1000 DELIMITER ',')

;

 

When I ran a query against the table in Example 4, the .log and .bad files were created on the same object storage.  This is really good if you don’t have access the database server.  However, it does mean that your database should have write access to the bucket/directory in the cloud storage. 

 

So far, we have created external tables that exist in the catalog until they are dropped.  This is frequently desirable, but in the case where you are only going to use the file once, then you can just reference the file as an external table in a query so that you don’t have to bother with the create and drop statements.  This is called a transient external table.  One reason for doing this is where the file is strictly there for a load or single query. 

 

Example 5.  Selecting from a Transient External Table

 

select order_nbr, retailer_name from EXTERNAL '/db2fs/external/oh2.del'

  LIKE gosales.order_header

  USING (maxerrors 1000 DELIMITER ',')

  FETCH FIRST 5 ROWS ONLY

;

select order_nbr, retailer_name from EXTERNAL '/db2fs/external/oh2.del'

     (ORDER_NBR INT, RETAILER_NAME VARCHAR(150),

      RETAILER_NAME_MB VARCHAR(150), RETAILER_SITE_CODE INT,

      RETAILER_CONTACT_CODE INT, SALES_STAFF_CODE INT,

      SALES_BRANCH_CODE INT, ORDER_DATE TIMESTAMP,

      ORDER_CLOSE_DATE TIMESTAMP, ORDER_METHOD_CODE INT)

    USING (maxerrors 1000 DELIMITER ',')

    FETCH FIRST 5 ROWS ONLY

;

 

Notice the following about this example:

·      Both queries do the same thing but being able to use the LIKE parameter saves a lot of typing if you have a corresponding table available. 

·      I query the data in the file in a select without first creating the table. 

·      In the second query, I had to define all fields in the file, even though I only selected the first two columns.  If you don’t define all columns, the query will fail, and all records will be copied to the “bad” file.  The column definitions syntax can be used with creating a named table too. 

·      A similar syntax to either of these queries can be used with the load from cursor feature and other things that involve a query. 

·      I used local storage again in this example, but cloud storage could have been used.

 

So far everything I’ve shown you relates to using data in a file you provide.  External tables also allow you to copy data from Db2 in just the format you want it.  Here’s an example:

 

Example 6.  Exporting data from Db2 into a file

 

CREATE EXTERNAL TABLE '/db2fs/external/employee_join.csv'

        USING (INCLUDEHEADER ON  DELIMITER ',')

   AS SELECT b.PROJNO, a.empno, a.firstnme, a.lastname, a.workdept

            FROM db2inst1.employee a, db2inst1.empprojact b

           WHERE a.empno = b.empno

             AND workdept like 'E%'

;

 

Notice the following about this example:

·      Data from the query will be placed in a file called employee_join.csv and the file will be in the path shown.  This path must be in one of the directories specified in the EXTBL_LOCATION parameter.

·      Column names are placed in the first record of the file since I specified INCLUDEHEADER ON.  The default is off.  There are many other formatting options so check the ones available on the CREATE EXTERNAL TABLE page. 

·      This feature makes it easy for you get the output file in just about any format you like since you define it with a select.  You can put the columns in any order, join tables, use Db2 functions, use predicates to limit rows output, etc.

·      In this case I used a transient external table, but I could have created a permanent table instead and then copied the rows desired.

·      If I didn’t have access to the Db2 server, I could have used a cloud storage system instead of the local file system. 

 

I think that the external table functionality opens up a remarkable number of possibilities.  For example, you could put the free Db2 Community Edition on a server with lots of delimited or JSON files, and with some create statements, you have JDBC/ODBC access to those files at no cost.  Alternatively, you can make lots of data available by building external tables on cloud storage and inserting the data into them that you want. 

 

***

 

I’m sure that there are many other things that you can do so please add anythoughts you have on the subject to my Facebook Page or my db2Dean and Friends Community and share your thoughts about them.

 

 

HOME | Search