Virtual Databases

7 January 2008

A problem that a number of organizations face these days is having several different databases on different vendors’ Database Management Systems (DBMS) and needing to write applications that talk to more than one of them at a time. For example, you might have databases on Oracle on UNIX, SQL Server on Windows and DB2 on z/OS and need to write applications that get data out of all of them. This can get rather complex because the person writing the application needs to know how to connect and write efficient queries for all of them. Often one person will not have the skills for all of the databases. It can be time consuming for them to learn new databases and the calls to some of the databases may be sub-optimal. Also you have to make the decision as to whether to imbed passwords to each database in each application or have the user of the application provide multiple passwords for multiple databases. With WebSphere Federation Server you can avoid these problems. Through the rest of this article, I will refer to the product simply as WFS.

WFS Proof of Technology, Salt Lake City, 26 Feb 2008: Learn about WebSphere Federation with lectures and hands-on labs at this free 1-day session!

What is WFS?

In effect, Federation Server puts DB2 glasses on the world. That is, you can create one virtual database that that appears to have objects from multiple heterogeneous DBMS’s all in one DB2 database. This way you can write an application that selects, joins, updates, etc., from one DB2 database and have WFS efficiently collect the data from the multiple sources behind the scenes. The world class DB2 optimizer is invoked when you query the federated database to make the queries to other databases as efficient as possible. DB2 does this by using known information about the other DBMS’s and by collecting statistics about the physical objects in the other databases as well. With this information DB2 and WFS can make very efficient queries across different databases. It can also perform transactions across databases in certain DBMS’s. This is not available for all DBMS’s

The major components of WebSphere Federation server are a DB2 ESE database server and the tools that allow Federation Server to get physical tables and other objects from other databases and show them in the DB2 ESE database. Even though there are separate logical components, everything runs under the DB2 database engine. You do all administration tasks to WFS using the DB2 Control Center or normal command line tools while connected to the DB2 database. With Federation Server you can even cache copies of the remote database tables locally in WFS because it is part of a DB2 DBMS. You can put your own application data in the DB2 database if you purchase an additional license to do this. Nothing is installed on the remote servers. To the remote database severs, WFS looks just like any other application connecting and executing SQL statements.

While WebSphere Federation Server can do many things it does have some limits depending on the external data source. If you are thinking about a particular application, please contact me or your local IT Specialist to discuss how best to perform your anticipated tasks.

How does it look?

As with all major modern relational databases you can create views on physical tables in a DB2 database. The simplest type view is one that is based on one table and looks exactly like the table. Applications that connect to the database just see objects – it does not distinguish between tables and views. With Federation Server, we introduce a third type of object called a “NICKNAME” that is just another object, like a table or view, that is presented to the application. You can think of a nickname as a view on a table in a different database. So when an application connects to a federated database it sees tables, views and nicknames and does not know that it is accessing different types of objects. It is usually a good idea to build views on nicknames and only access the views from your application. That way, your application may not have to change as the physical objects change in the remote databases. To illustrate nicknames I have created this diagram:

(T1 …. T6) = Tables; V1,V2 = Views; N1 = Nicknames; SP1 = Stored Procedures; FP1 = Federated Procedures

In this example, any application connecting to the DB2 on Linux (it could also be on Windows or UNIX) database would see the T1, T2, V1, V2 and N1 objects and not know that there is any difference between them. The application can join the objects in any way you wish and can update any or all of them in one transaction, with all results committed or rolled back together for certain data sources. DB2 takes care of this for you. You may have also noticed that I showed the object, FP1 in this example. This stands for Federated Procedure. Using Federation Server you can build these federated procedures that allow your application to execute stored procedures in the remote database and think that it is just calling a local DB2 stored procedure.

How it works?

WFS does not put any software on the remote database servers so you need to install the client software on the federated server for the other databases. This allows WFS to connect to the remote databases as any other application would. Therefore, if you wish to connect to remote SQL Server and Teradata databases then you would need to install the client software for those two products on the federation server.

WebSphere Federation Server installs several libraries called “wrappers” that do the work of allowing non-DB2 database objects from remote data sources to appear as DB2 nicknames. One of the most common things that a wrapper does is to translate data types from the remote data source into a DB2 data type in the nickname. For example, an Oracle DATE data type has a time component and a DB2 DATE data type does not. By default when a nickname is created on an Oracle table with a DATE data type, the corresponding nickname column will have a TIMESTAMP data type. Most of the defaults are good for most applications, but sometimes you will want to customize them. In this example you can create a copy of the default wrapper with a new name and modify it. For example, you may want nickname columns based on Oracle DATE columns to appear as a DB2 DATE column with the time component truncated. Once you modify the nickname all subsequently created nicknames built under that wrapper will have your new default conversion.

Wrappers also do other work such as knowing what functions and SQL syntax is supported in the other DBMS’s and making appropriate translations when accessing the remote databases. They also have knowledge built in to allow queries to be optimized for the particular remote database. There are a number of other things that they do as well, but I just wanted to hit the highlights here.

Once the client libraries and WFS are installed and you have created your wrappers you then build “server” objects in the DB2 Federated database that define individual databases on remote servers. In the server you specify which wrapper to use for a database as well the username and password to use when accessing it. Basically the “server” is a named object in your WFS system that becomes an alias for a remote data source.

Finally you build nicknames on some or all of the tables and views in the remote databases (WFS “servers”). You can tell WFS to build the objects on all tables in the remote data source, on a subset of the tables or on individual tables. It just depends on what you want to do. You can control access to the nicknames with standard SQL GRANT and REVOKE statements just as you would for tables and views. WFS actually makes the connection to the remote servers on your behalf, so the users and applications do not need to know passwords of remote databases.

There are other products that do some of what WebSphere Federation Server does, but the thing that really sets WFS apart is the query optimizer. The wrappers contain general DBMS specific performance information and WFS allows you to collect object specific information such as the number of rows in the table and available indexes. You can also store certain information in the server object such as the speed of the remote server compared to the local federation server and the relative speed of the network. The optimizer uses all of these bits of information to select the optimal path to the data that you want. For example, it may choose to send a complete join of two tables on one remote database asking for combined results, or it may send one query for each remote table and combine the results locally. When multiple sources are involved in a federated query, WFS can send the query fragments to each remote database asynchronously which means that you can get all of the remote servers working on your query in parallel! If you would like to read more about how WFS ensures optimal performance, you can review the following links.

Federation Server Performance Part 1

Federation Server Performance Part 2

Asynchronous Query Execution

Cache Data Locally with MQTs

Available Data Sources?

WebSphere Federation Server comes ready to federate several data sources. Some of the most popular are:

DB2 (on z/OS, iSeries, VSE, Linux, UNIX, Windows)
Informix
Oracle
MS SQL Server
MS Excel Spread Sheets
XML and Flat Files
ODBC and OLE DB sources
Sybase
Teradata

In addition to these common data sources, WFS can also create nicknames on Web Services as well as SAP, PeopleSoft and other popular applications without you having to know about the cryptic table names. For a full listing of data sources and the specific versions with which WFS works out of the box, please see this link:

Data Sources and Versions

Who uses it?

Over the years many organizations have had multiple independent divisions creating their own systems on different platforms. Now they find themselves with a need to create central applications that use and update data from several divisions for things like government compliance mandates and centralized customer service. Creating a central federated database eliminates the need to move lots of data around or write extremely complex applications that access all sources.

Some companies regularly acquire other companies and need to have applications that show company-wide information almost immediately. One way to do this is to have a flexible federated database where the new organization’s data is joined in quickly rather than having to wait for databases to be combined. For example, for good customer service it is great when an analyst can see all of a customer’s data from both the old and new company’s sources soon after an acquisition is completed.

Some organizations want to centralize their applications data over time, but do not want to do large “rip and replace” projects. With WFS, you can write your central applications that access a federated database with some local centralized data and with those applications also accessing the data in the remote systems. The existing divisional applications can continue to run as well, working directly on their local copies of the data. As it is appropriate for your organization, you can migrate individual tables or small groups of table to your central DB2 database servers by just dropping the nicknames and creating physical tables with the exact same names as the nicknames. The applications that were accessing the federated systems don’t know that anything changed because then don’t know the difference between nicknames and tables – they just select, update, etc. from the object name.

Additional Reading

IBM WFS Home Page

Federated Two-Phase Commit

OS and Platforms Where WFS Runs

Overview

WFS Proof of Technology, Salt Lake City, 26 Feb 2008: Learn about WebSphere Federation with lectures and hands-on labs at this free 1-day session!

HOME | Search