db2Dean a.k.a. Dean Compher a.k.a “The Dean of DB2” is here to answer your DB2 Questions !
Just for Grins Watch:
Add your insights to the:
None Currently Scheduled
Welcome to db2Dean’s web site. I’m Dean Compher an IBM IT Specialist who, along with my team, helps customers and prospective customers with DB2 on Linux, UNIX and Windows (LUW) technical questions and issues. As this page makes painfully clear, I am a DBA and not a web designer, but I would be happy to get your DB2 questions answered or talk to you about the great features of DB2 or IBM Integration Products. If you are looking at a new database solution or want to compare us to your existing database vendor, please do not hesitate to contact me about getting a presentation or just to ask questions. My e-mail address is dean@db2Dean.com
I am located in Utah and primarily serve DB2/LUW customers in Utah and Nevada, but I can forward requests to my peers in other technology and geographic areas as well. My team also covers Big Data and Informix and products. There are questions that I get on a regular basis, and I will write articles relating to them here. I hope that you find them useful. I also welcome suggestions for future content. Click here for more information about me.
20 April 2016
One of the great things about the cloud databases and other analytics tools provided on Bluemix is the ability for anyone to inexpensively and easily deploy a data base for analysis. You do not even have to know how to administer a database to do this. But once you have the database deployed, how do you get the data into that database in the format you want it? The answer is IBM DataWorks. It is a tool that makes it easy to shape the data from your on premises sources into your cloud data store or even Watson Analytics. Through an easy to use interface it allows you to join tables from your source, filter data, remove or rename columns, remove spaces, change case and several other things. It also shows you a lot of information about the quality your data. It is undergoing rapid development, so any feature not shown in this article may be there in the weeks and months ahead.
DataWorks is a Bluemix service that you deploy with a few clicks. Once deployed, it allows you to configure your source and target data stores and then define how you want to copy data from the source to the target and which transformation (shaping functions) you want to apply to the data as it is being copied such as filtering, removing columns joining, etc. When you create the set of refinements from a particular source to a particular target, you create an activity and give it a name when you save it. You can then run that activity over again anytime you like. You can also crate several different activities and save them for use under various circumstances. To protect the data in transit from your on-premises data sources, you can configure a secure gateway.
I start this discussion by showing you some of the more interesting things you can do with DataWorks, and then give an overview of how to begin using it. You can also see the documentation for DataWorks. If you go to this link and scroll down, you can see the list of sources and targets. This list is being increased rapidly, so I didn’t add the list here because it would get out of date so quickly.
Once you have configured connections to your target and source, entered the refine process, selected the source tables with which you want work you get to the screen shown here where you begin the process of refining your data. To make it easy to perform the function the first 1000 rows from each table are selected and shown as you proceed:
Here I am in the refine and copy panel. You can see information about the quality of the data. You can also see that I’ve selected three tables to work with called Order_Details, Order_Header and Product. Order_Details is selected and you can see the first few columns and some data in them. You can see this information for the other tables just by clicking on them. Along the left side of the table you can see a number of table level functions. Some of the most interesting refinements that you can do you get to by clicking on the column name that interests you.
Using the shaping functions you can do several things to the data including removing columns, running functions on the data, sort and filter. The functions you can choose is dependent on the data type of the particular column. As you can see here, I chose a character column and I get the “TEXT” choice that is not on non-character columns like integers and dates.
Table Refine Level Functions
As you can see in the figure below, the main table and activity level functions are listed to the left of the table of data. In this case I’ve selected the “Metrics” option which inserted a row above the first data row showing quality data for each column. For Column, Fix data and Metrics functions you will be working with one table at a time. So be careful to select the correct table along top of the menu to make sure you are working with the right one. In this instance I’m working with the ORDER_DETAILS, ORDER_HEADER and PRODUCT tables.
In the columns dialog box you can remove columns from view and reorder the columns. To rearrange the columns with in the selected table just drag individual columns to where you want them to be. You hide a column by unchecking it. It is only hidden from view while you are creating your activity. It will actually be copied when the activity is run. We discussed how to actually remove columns from the activity above.
When you select a table and click “Fix data” it will show you any data quality issues for individual columns. Types of data quality issues include a significant number of missing data fields or non-unique values when DataWorks thinks it should be unique. You can check any fixes you like and click the apply button. This will cause any columns with bad data to be removed from your view and from the target when the job is run.
By choosing a table and then clicking Metrics, you will be shown information about the sample data in the first row of the table as shown above. You can hover over the information presented to be shown more details.
Click the history button at any time to see the actions that you have taken on the tables and columns that will be implemented when you run the refine process.
One of the things that you can do once you selected the tables that you work with is to join them. This allows you to combine the columns of various related tables into one table in the target database. DataWorks will create the new table in the source if you want it to. This makes some analysis tasks much easier. There are a number of ways to join the tables including only using rows if the common join column(s) are in both tables or if they are in one or the other or both. In the join box, you can see the following join types and select one.
Now that I’ve shown some of the more interesting things that you can do with your data I’ll briefly describe how to get started using DataWorks. This is by no means a comprehensive, step-by-step set of instructions. I’m just providing a few hints about what steps to take to get started. I found it was pretty easy to figure these things out, so I recommend that you just get in and clicking around. Generally the steps you take to get started are:
You can also call DataWorks using the API.
There are several other things that you can do that I did not cover in this short summary. If you find any features that you particularly like please post them to my Facebook Page or db2Dean and Friends Community.
Mary Lou’s Stained Glass