Quickly Convert a File to a Table in Bluemix
26 February 2015
The ability to quickly deploy a database in the cloud is great news for both the DBA and the application developer. As you may already know the IBM Bluemix platform allows you to deploy the major IBM databases in the cloud as well as a number of open source databases. For smaller sized databases this is currently free. This allows you to experiment with the services and see if they work for your next project or just to try out a new feature that you cannot install in your own environment. This month I’ll describe a fast and easy way to load data from a file into your database that will even create the table if you want that. This is a huge time saver when you need put a file into a table to do any sort of analysis. This facility works with Excel, delimited and other file types and will use the column name header record in the file to create the table columns!
In Last Month’s Bluemix Databases article, I described some of the databases that are available, showed how to register, and described how to connect from your existing tools like Data Studio. This month, I’ll use the SQL DB database for my examples instead of dashDB, but the tools I describe work for both databases and others too. As you may recall SQLDB is DB2 configured for a transactional workload. Like dashDB it uses the new Native Encryption feature to encrypt all of the data in the database.
You deploy your SQL DB database just like you did your dashDB as described in the Bluemix Databases article except that you choose “SQL Database” in the Data Management category instead of dashDB. I’ll provide more connection information later in the document. When you launch the SQL DB database you will see the following panel that shows the three main categories of tools.
When I was actually a DBA one of the really time consuming things that I had to do on occasion was to get data from a file that someone gave me into a database table either for test data or to do some analysis using SQL Tools. That was always a slow task if there were more than a few columns because I had to write an import script and create DDL for a table that matched it. So one of my favorite tools that Bluemix provides is the ability to grab your file and have it create the table for you and then load the data. I happened to have a spreadsheet of sold properties in and around Salt Lake City with a few dozen columns that I wanted in a table. With 78 columns it would have taken a while to create the table and the import or ingest script to do this. With the SQL DB Load Data tool it was easy, especially since my Excel file had column names. SQL DB and dashDB also created the data types based on the data in the columns. Here are the steps I followed.
First I clicked the “Load Data” Button shown above after launching my SQL DB. This brings you into a wizard that has 4 steps to let you tell SQL DB about the file and the data. Each stop is a tab in the wizard dialog and those steps are as follows:
On this step you Navigate to the file on your workstation that you want to upload and complete some information about it like whether the first record in the file contains the column names, the type of separator and the date format if the file contains dates. Excel contains meta data about the columns so most of these choices will be greyed out if you choose an xls or xlsx file. Once you click he “Load File” button you will be presented with the first few columns and some example data so see if the system is on the right track for categorizing the data. If everything looks good then click Next.
Choose whether you want to use an existing table or have Bluemix create a new one. I chose to create a new table and clicked Next.
If you chose to create a new table in step 2, then you will be presented with the option to change the table name and change the column names the ones imported from the file. See my note about table and column names below. I changed my table name to SOLD_PROPERTY_Q4 and left the column names as they were. It shows the data in each column for the first few columns and rows. When done click Next.
This tab shows you the results of the load including the number of records loaded or rejected. On this tab you will also see a link to look at the log for the load. I recommend using Wordpad to look at the log as notepad does not format it very nicely. From this tab you can also click at link to view the full table structure with all columns and data types.
There are a few things to watch out for:
In my file, a null date value was 00/00/0000 and that is not a valid date, so any records with that value got rejected. In my case, I just changed those dates to a valid date and my load worked.
Where spaces exist in a column name they will exist in the table. Case is also preserved, so if the spread sheet column shows “HOA Fee” then that will be the column name you must use in the query and it must be quoted. Never heard of have having spaces in a column name? Well here is what I mean. Because my spread sheet had column names that used mixed case and has spaces in the column names, I would have to write my queries exactly as follows if I made no changes:
SELECT "Acres", "HOA Fee", "Total Bedrooms"
Note that even the column “Acres” must also be quoted because it contains lower case letters and the case on each letter must be exactly correct. When you omit the quotes the query automatically converts un-quoted columns to upper case and columns with spaces that are not quoted will cause odd syntax errors. In my example the query would fail if I didn’t use quotes because the column “ACRES” does not exist and the spaces wound not be interpreted correctly. I highly recommend editing the column names in the file before the load to convert everything to upper case and to remove spaces. I would much prefer to run the following query where case does not matter and so would most users.
SELECT acres, HOA_FEE, total_Bedrooms
You can also edit the column names and table name as part of the load process. You do this in step 3, Select a Table, by just typing over the names shown. Once you have a table you can append more data from another file or replace the current data with the data from the new file. You choose whether to build a new table or use an existing table using a radio button on step 2, “Choose the Target”. You can use this feature to build more than one table from an input file.
The wizard-like navigation through the load process is great, but sometimes you want other alternatives to navigating around. For example, you may want to query your new table or review the columns. For these you would want to choose the Run Query or Work with Tables option from the Manage menu item. This is where the menu at the top of the screen comes in:
The query and table tools are very easy to use and I highly recommend them. However, if you want something more advanced then you will want to use IBM Data Studio or some other GUI tool to explore and manage your schema. Data Studio is a free download. At the time I am writing this article you can only add or alter tables and other objects in your own Schema name. I suspect that this will change soon. In any case, to connect Data Studio or other GUI tool you will need information including IP address, Port, User Id and password. To get all of this information from one place in SQL DB you will need to do the following.
The credentials (User ID and Password) are assigned when the SQL DB service is bound to an application. This just means that you need to click an application or application development environment and tell Bluemix you are associating it with our SQL DB database. You don’t need to do anything else with the application after that. I will walk you through the steps for this here:
Start by going to your outermost Bluemix dashboard and pressing the big “+ Add CREATE AN APP” button:
From there just follow the prompts to create an application. I chose, Web and then PHP, but it doesn’t really matter if you just want to get your SQL DB credentials. I then clicked through the options, giving my service the name of phpDB2. Finally I got to the screen where I could click the “+ BIND A SERVICE”. You need to click that button to connect your new service to the database to get the credentials.
Clicking the “+ BIND A SERVICE” button will bring up a dialog with your SQL DB database listed. Just choose it and click the Add button to bind the service with your database. You may be prompted to restage the application, so just do that. Once that process is complete you can see your database user id and password along with all of the other connection information in a JSON document. To see that you can click on the “Show Credentials” link in the SQL Database Box in your application dashboard. You may need to scroll down to see this.
This displays a JSON document with all of the information you need to connect to this database:
Once you leave the application dashboard, you can always get back to it from the main Bluemix Dashboard. To do that click on the APPS link in the left pane and then click on the service name, in my case it was phpDB2. This will allow you to get to the “Snow Credentials” output again as shown above.
This was all pretty easy, so if you need a DB2 database to experiment with, I highly recommend trying the SQL DB. If you see other nifty things you can do with this tool or if you encounter a problem and develop a solution, please post them to my my Facebook Page or the db2Dean and Friends Community.