Db2 Utilities and Cloud Storage

Dean Compher

31 March 2022

 

 

Db2 can now use various types of cloud storage to store and retrieve files directly.  Uses include files to be loaded into Db2 and files unloaded from Db2 in the form of backups and through External tables.  Db2 can backup the database to cloud storage and archive logs there too.  Restore can access those files.  Supported cloud storage systems include Amazon S3 and IBM S3 Cloud Object storage.  These cloud storage services are inexpensive and can be configured to keep copies at multiple regions, so it is great for disaster recovery.  It is also useful when you don’t have room on your database server for these sorts of files.  In this article I will give you some tips and show some examples of how to interact with cloud storage.  Please join me in thanking Matt Emmerton for providing me with lots of help writing this article.

 

You need to be at Db2 v11.5.7.0 or higher to use all of these features.  Different Db2 facilities began using cloud storage in different mod paks so some things were available before 11.5.7.  This feature is only available for databases running on Red Hat or SuSE Linux running on x86_64 systems.  It requires libcurl and libxml2 packages.  If you would like to see this feature included in Db2 on your operating system/hardware then add an idea in the AHA system or vote for it if it is already there.

 

Here are the Db2 facilities that can use Cloud storage as of 11.5.7:

·      LOAD and LOAD CLIENT

·      INGEST

·      EXTERNAL TABLE – load and unload files

·      BACKUP DATABASE

·      RESTORE DATABASE including LOAD COPY IMAGES

·      LOG ARCHIVING (LOGARCHMETH1/2)

 

Use cases for using cloud storage include:

·      Getting Data from external sources – Instead of configuring some way of allowing your external partner to put files in your system with data you want to put into your database, one of you can subscribe to a cloud storage service, give the other access and then you just load the files from the service at some interval.

·      Local preference for using cloud storage – Many development organizations are already storing files on cloud storage. 

·      Disaster Recovery – Keeps your backups and archive log files off site, and if you purchase a multi-region service then your files are kept in different parts of the country.

·      Lack of DB Server Space -  Stage Load files and store backups and archive logs on this inexpensive storage.

 

Now I will discuss some of the specific ways you can use cloud storage with Db2.  I will use IBM S3 Cloud Object Storage since I already have a subscription to it.  It is important to note that S3 is an Open Source standard for object storage developed by Amazon, and a number of vendors, including IBM, use it.

 

Before you can use any of these utilities you must have access to a supported cloud service and you must generate the credential file appropriately. 

 

External Tables

 

The easiest way to get started with cloud storage in Db2 is using external tables.  If you aren’t already familiar with them, external tables allow you to point to a file and either read it or write to it directly without loading any data into your database.  Essentially you create a table and instead of putting it into a table space, you reference a file in the CREATE TABLE command.  If the file is there you can SELECT from the table as you would any other table and if you INSERT into the table a file will be created or overwritten with the results of the insert.  External table files can exist on the database server or can be on cloud storage.  Please see my article on using External Tables with Cloud Files for more information.

 

Prep for Other Utilities

 

The rest of the utilities that I describe here need to have some things configured before you can connect them to cloud storage.  Your database needs a key store, and you need one or more storage access aliases.  The keystore provides a place to keep the credentials to your cloud storage encrypted, but available to Db2. 

 

If you already have a keystore then you can use that.  Newer versions of Db2 will have the GS Kit keystore software already deployed.  You can verify that by looking your <instance_home>/sqllib directory and if you see a directory there called gskit, then you have one already.  If you don’t have any key store available, then you can get the free IBM GS Kit.  Download and install instructions are available in the knowledge center. 

 

If you haven’t already configured a keystore, say for using Db2 native encryption, you will need to create it and then configure Db2 to use it as describe in the Db2 Knowledge Center.

 

STORAGE ALIAS

 

Once you have a key store you will need to catalog at least one storage alias with the connection information and credentials for your cloud storage.  To use this storage in your Db2 utilities, you just reference the alias.  At a minimum it has the address of the cloud service along with the credentials needed to connect to it.  You can also have additional information such as the container that says which directory in your cloud storage you want to use as well as a string you can use to automatically prefix a file.  You are likely to have multiple aliases for different purposes.  The storage aliases are created at the instance level, so any database in the instance can use them.  Figure 1 shows an example of the command for adding your storage access alias in Db2.  This is an example of what you might create for backing up your database to IBM S3 Cloud Object Storage.   I have color coded the userid and password for easy cross reference to the credentials file in Figure 2. I generated the credential file in the cloud storage interface. 

 

Figure 1. Storage Access Alias Example

 

CATALOG STORAGE ACCESS ALIAS bkupalias

        VENDOR S3

        SERVER s3.us-east.cloud-object-storage.appdomain.cloud

        USER f0274220ee4c4846b715569972580510

        PASSWORD 4d8d32f0c707f86d239ab3bbb6c0c166dfde4d141e2bc92a

        CONTAINER s3utilities

        OBJECT dmcprefix;

 

Figure 2. Top of Credentials file from IBM COS

 

{

  "apikey": "Yw3WHOcxCyrr9ZQ6wa-K9dSPkuvCWk3M38lJQkL2kCQQ",

  "cos_hmac_keys": {

    "access_key_id": "f0274220ee4c4846b715569972580510",

    "secret_access_key": "4d8d32f0c707f86d239ab3bbb6c0c166dfde4d141e2bc92a"

  },

  "endpoints": "https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints",

 

Some other things to note about Figure 1:  I am calling the alias “bkupalias” and since I’m using the S3 type of storage in IBM cloud object storage I chose that value for the VENDOR parameter.  SERVER is the address of the service.  From my IBM Cloud Object Storage web page, I chose the PUBLIC end point for this value because I am not running Db2 in the IBM cloud (Private) nor am I connecting through a direct endpoint.  In my cloud storage service I crated a “Bucket” (directory) called “s3utilities” and since that is where I wanted to put my backup files, I noted it in the CONTAINER parameter.  Finally, I wish to prepend the string, “dmcprefix” onto my files, so I populated the OBJECT parameter with that value.

 

To show the aliases currently cataloged in the instance use this command:

 

            db2 list storage access

 

A note about the credentials file:  I generated the credentials file using browser access to my cloud service.  To do this in the IBM Cloud Object Service, I chose, the Service credentials tab, clicked the button to create a new credential for the bucket.  In the dialog box presented, I entered the name I wanted and then expanded the Advanced options and then clicked the button to Include HMAC Credential.  If you do not have it generated with HMAC then you will not get the cos_hmac_keys necessary for Db2 to communicate with your object storage. 

 

Backup to Cloud

 

Once the alias is created it is easy to backup to the cloud.  Figure 3 shows an example of backing up a database using the alias created in Figure 1. See all syntax options for the DB2REMOTE parameter in the BACKUP DATABASE knowledge center page.  Note that you reference the alias with the new DB2REMOTE parameter. 

 

Figure 3 Backup to cloud storage

 

backup db dsmdb to DB2REMOTE://bkupalias//

 

You should note that if your instance can’t reach the cloud storage for some reason such as a bad credential in the alias configuration or transient network problem, your backup file may be placed in a directory in your instance home and that can fill that file system.  To avoid that you can use the DB2_OBJECT_STORAGE_LOCAL_STAGING_PATH registry variable to land it somewhere else as describe in the Remote storage requirements page.  When testing, it is a good idea to check the <instance_directory>/sqllib/tmp/RemoteStorage.00000 directory where 00000 is the database partition number from time to time.  You might want to clean up files that get accidently placed there. 

 

Archive Logs to Cloud

 

Archive logs can be placed directly on cloud storage too by changing the value of the LOGARCHMETH1 (and/or logarchmeth2 if desired).  If you put your backups there then you will probably want to also keep your archived logs there too so all of your recovery objects are in one place if you need them.  Figure 4 shows an example of creating a storage alias for logs and then updating the logarchmeth1 parameter to enable cloud storage.  Don’t forget that if you were not archiving logs before, then you will need to back up your database after changing this parameter. 

 

Figure 4.  Archive Logs to Cloud Storage Example

 

CATALOG STORAGE ACCESS ALIAS logs

        VENDOR S3

        SERVER s3.us-east.cloud-object-storage.appdomain.cloud

        USER f0274220ee4c4846b715569972580510

        PASSWORD 4d8d32f0c707f86d239ab3bbb6c0c166dfde4d141e2bc92a

        CONTAINER s3utilities

        OBJECT log;

 

db2 update db cfg for dsmdb using logarchmeth1 DB2REMOTE://logs//

 

IBM S3 Cloud Object Storage gives browser access to see what you have there and to maintain your files manually.  Figure 5 shows a screen shot of my screen after adding a couple of csv files and running the commands above to backup my database and to archive my logs to the s3utilities “bucket” in my cloud storage.  You may want to put different types of files in different buckets, but for my examples I just used one bucket.  You would need at least one credentials file for each bucket and that means using different credentials when creating the storage aliases. 

 

Figure 5.  s3utilities bucket screen shot with backup and archive log

 

Graphical user interface, application, Teams

Description automatically generated

 

Load from Cloud

 

Next, I’ll show how to put data directly into Db2 from cloud storage without having to copy it to my system first.  In the following examples I’ll pretend that I need to get a file from a third party, and we agree that having them copy their file to my cloud storage using a REST API is the easiest thing for all involved. 

 

So, the first thing that I do is write a little script (Figure 6) that has my credentials so that they can easily add their data to my service.  In this case I might subscribe to a new instance of the service to be used only for this purpose so that they can’t possibly access any of my other files.  Figure 6 has nothing to do with Db2.  cURL is just a utility to make REST calls and in this case it uploads a file.  You could also put the file using Java, Python and many other languages too.  This script first uses cURL to get the authentication token in a JSON format from the cloud system then it gets just the token string and finally it uses cURL to add the employee.csv file to s3utilities bucket of the cloud storage service.  Feel free to borrow the code here, but make sure to type over the single and double quotes as pasting from this document may change the quote types. 

 

Figure 6.  Shell Script uploading the employee.csv file to Cloud Storage

 

tokenall=`curl -X "POST" "https://iam.cloud.ibm.com/oidc/token" \

     -H 'Accept: application/json' \

     -H 'Content-Type: application/x-www-form-urlencoded' \

     --data-urlencode "apikey=Yw3WHOcxCyrr9ZQ6wa-K9dSPkuvCWk3M38lJQkL2kCQQ" \

     --data-urlencode "response_type=cloud_iam" \

     --data-urlencode "grant_type=urn:ibm:params:oauth:grant-type:apikey" \

     `

token=`echo $tokenall | awk 'BEGIN {FS = "\""} ; {print $4}'`

 

curl -X "PUT" "https://s3.us-east.cloud-object-storage.appdomain.cloud/s3utilities/employee.csv" \

       -H "x-amz-acl: public-read" \

       -H "Authorization: Bearer $token" \

       -H "Content-Type: text/csv" \

       -T employee.csv

 

Once the file is uploaded to the cloud service, the Db2 LOAD utility can be run to access the file directly and load it into the Db2 database.  Figure 7 shows and example of creating a storage alias and then using that alias in the LOAD utility.  See the LOAD Command in the knowledge center for more information on using the DB2REMOTE parameter in the FROM clause.

 

Figure 7.  Loading from Cloud Storage

 

CATALOG STORAGE ACCESS ALIAS loadit

        VENDOR S3

        SERVER s3.us-east.cloud-object-storage.appdomain.cloud

        USER f0274220ee4c4846b715569972580510

        PASSWORD 4d8d32f0c707f86d239ab3bbb6c0c166dfde4d141e2bc92a

        CONTAINER s3utilities;

 

LOAD FROM DB2REMOTE://loadit//employee.csv OF DEL REPLACE INTO stage_emp;

 

In the examples above, I’ve shown examples of using S3 storage for the BACKUP and LOAD utilities and well as using S3 for archiving log files.  There are many variations on the syntax so you should review the links I provided to see all that can be done such as overriding the CONTAINER and OBJECT values.  I did not show RESTORE DATABASE or INGEST utilities, but they would use similar syntax.  Whereas BACKUP will stream the backup file to the cloud without landing the file locally, the restore must first land the file on your database server before it can do the restore. 

 

***

 

I think that that being able to access cloud storage directly from Db2 opens up a lot of possibilities.  Please share any interesting ways you use Db2 and cloud storage to my Facebook Page.

 

HOME | Search