DB2 LUW Statement Concentration

DB2/LUW Statement Concentration

Dean Compher

27 May 2011

Everyone knows that a key to good database performance is efficient SQL, but did you know that the use of literals instead of parameter markers in your “WHERE” clauses can cause performance problems? The use of parameter markers when coding SQL from your application is usually the best practice and can improve performance on your database. This should be kept in mind when developing new applications, but for existing applications DB2 provides ways to make your queries with lots of predicate literals much more efficient. In this article I will explore ways that this can be done.

What are parameter markers and why do I care about them? When writing SQL in an application program whether it is Java, C++, PHP, COBOL or any other language you have choices when writhing the predicate or “WHERE clause” of the SQL. You can choose to compare a column to a literal like “ABC” or 0.5 or you can compare it to a variable defined in your program. These variables are usually called a “parameter marker” or are sometimes are also called a “host variable”. Here is an example of a query with a literal in the predicate:

SELECT COLOR, SPECIES FROM BIRDS WHERE FAMILY = ‘FALCON’

The same query with a parameter marker in the predicate would look like:

SELECT COLOR, SPECIES FROM BIRDS WHERE FAMILY = ?

When either of these queries arrives at the database server for the first time as dynamic SQL, the query must be prepared by the database. (The use of static SQL is beyond the scope of this article.) This entails a number of activities including parsing the query and determining how the data will be accessed – use of indexes, etc. The query along with the access plan is then placed in the package cache. The next time an identical query arrives it may be found in the package cache if it is still there. Finding the query is good because the database can skip most of the prepare processing and that saves server resources. However, DB2 will only use the plan in the package cache if the queries are identical. For example, the following two queries would both have to be prepared because DB2 does NOT believe that they are identical:

SELECT COLOR, SPECIES FROM BIRDS WHERE FAMILY = ‘FALCON’

SELECT COLOR, SPECIES FROM BIRDS WHERE FAMILY = ‘HAWK’

However, if DB2 sees the query below twice then the second time it arrives it would not need to be prepared even if the value being substituted in the parameter marker is different each time:

SELECT COLOR, SPECIES FROM BIRDS WHERE FAMILY = ?

DB2 will continue the use the access path calculated the first time the query arrived no matter how many times it arrives with different values in the parameter marker as long as the query remains in the package cache.

When high volumes of queries are received in a system, then reusing access paths from the package cache can save considerable resources on the database server processing a transactional (OLTP) workload. If you have lots of queries with unique predicate literals then you will likely be squandering resources on unnecessary prepares. How do you know if this might be a problem for your database? You probably have a problem with literals in your SQL if you have a package cache of more than 8,000 to 16,000 pages (33MB – 66MB) and a low package cache hit ratio, say less than 95%. These conditions are often found with excessive CPU consumption. Expanding the package cache size is not really the answer because the larger the cache, the more work DB2 must do to search the cache each time a query arrives. This cache is not indexed so significant resources can be expended just doing the search and if you have a low package cache hit ratio, then you frequently have to prepare the statement after spending all of those resources searching! Mr. Graham Murphy, a DB2 expert with many years of experience, told me about this situation.

The good news is that there are options for solving this problem. Of course you can rewrite the application to use parameter markers, but that that is often a very expensive solution. IBM provides two different tools to help solve this situation.

Statement Concentrator – runs on the Database Server
Optim PureQuery Runtime – runs on the Application Server

Statement Concentrator

The Statement Concentrator is a feature that can be enabled on any edition of DB2 at no additional charge. This feature tells DB2 to treat queries that are identical except for literals in the predicate as if they used parameter markers. These queries will be stored once in the package cache with one access plan rather than being stored many times with each query having its own access plan. To enable this feature you just change one Database Configuration setting:

UPDATE DB CFG USING STMT_CONC LITERALS IMMEDIATE;

I highly recommend that you read the article about the Statement Concentrator before using it.

Optim pureQuery Runtime

Optim pureQuery Runtime is a tool that runs on the application server and optimizes query performance in the database in several ways. One of its features allows pureQuery to intercept queries with literals as they pass through the JDBC type-4 driver or CLI driver and send the query to the database with a parameter marker instead. Since the database only sees queries with parameter markers, the statements get concentrated just as if the application used parameter markers. This works with most languages that use the JDBC type-4 or CLI drivers except for .NET. A very nice benefit of using this method of statement concentration is that it moves the burden of concentration to the application server where it belongs. Although the burden is light, it still requires resources.

Optim pureQuery Runtime is a jar file that sits on your application server or workstation. Although this is not exactly how pureQuery works, the way that I think of it is that it plugs into your JCC driver or CLI driver. As queries go that driver on their way to the database, pureQuery can substitute better queries for the ones actually being sent by the application. To enable the literal substitution, you enable the sqlLiteralSubstitution property for the pureQuery tool. This is explained in the sqlLiteralSubstitution page of the Information Center.

As I noted above, Optim pureQuery Runtime can do many other things to optimize your queries and the security of your access to the database including using static SQL. You can read more about it in the Understanding pureQuery set of articles. Optim pureQuery runtime is a feature that must purchased in addition to DB2.

One More Thing

One final note about using statement concentration is that it does not give optimal performance if you are using distribution statistics to optimize queries. For the DB2 query optimizer to make use of distribution statistics, literals must be in the predicate. For the most part, if you are using distribution optimization you will have done so intentionally. The type of queries that use distribution statistics are generally reporting or database warehouse types of queries.

***

I hope that you found this discussion of DB2 Statement Concentration useful. Please “like” my Facebook db2Dean page at facebook.com/db2dean and add any feedback about what you liked or what I can improve. You don't have to be a member of Facebook to add comments. I can only make things better with your help!

HOME | Search