DB2 Range Partitioning

Using DB2 Table (Range) Partitioning

31 October 2009

You have heard a lot about DB2’s Range Partition feature that allows data in a single table to be placed in multiple tablespaces and that allows more scalability, elimination of unneeded partitions for individual queries, and the ability to quickly roll-in and roll-out partitions of data. There are plenty of resources (links are at the bottom) that describe what it is and tell you how to implement Range Partitions, but in this article, I’ll explore when you want to use Table Partitions and provide tips on getting the most out of them. It should be noted that almost all of the information in this article comes from the excellent presentation given by Christopher Tsounis, an IBM Executive IT Specialist at the 2009 North American IDUG Conference.

Briefly, Range Partitioning (or table partitioning) allows you to create tables where the data is placed in multiple partitions according to data ranges that you specify. The range partitions can be placed into the same tablespaces or different tablespaces. The most common way to range partition is by date, but any data type or calculated columns can be used. An example is having rows with a date column and putting rows of different months in different partitions. The main advantages of a range partitioned table are:

Improved Query Performance

The DB2 optimizer is aware of the partitions and if your query needs to scan many rows to get your result set and your predicate (where clause) uses ranges that you defined, then only partitions that have rows that satisfy your query will be searched instead searching the entire table. This is called “partition elimination” and can greatly reduce query elapsed times.

Allows for optimized roll-in / roll-out processing of Ranges

New partitions can very quickly be attached and old partitions can quickly be detached as needed. In this process you load or insert data into a new partition before making it part of the partitioned table. This load process has no more effect on the use of our subject table than the load of any other independent table in the database. Once you finish loading the new partition you use the ATTACH command to integrate the new partition into the primary Range Partition Table. This process is done in one unit of work, eliminating the possibility that a partially inserted set of data will skew query results or lock other queries out while the load is being performed. However, you do need to drain all queries for the attach process, but it is very fast.

Optimize the management of very large tables

For very large tables the time needed to do certain operations like backup can get out of hand. However if you partition a table, you can perform many of these operations on individual partitions, staying within your batch window and limiting the impact of those utilities to individual partitions. You need to put the different partitions into different tablespaces to allow backup of single partitions. This also allows you to have tables of unlimited size with no software limitations.

Greater Index Placement Flexibility

For non-partitioned System Managed Storage (SMS) tablespaces all of your indexes must be in the same tablespace as your table, but SMS range-partitioned tables allow for you to place indexes in different tablespaces. This also allows SMS tables to grow beyond the size limit of a Regular Tablespace.

Get the most out of your Range Partitioned TABLES

Here are several things that you can do to optimize the use of your partitioned tables.

Use DB2 v9.7

If you are going to start using Range Partitioning, upgrade to DB2 v9.7. Range Partitioning was introduced in DB2 v9.1 and most of the advantages of this feature are there in that version. However, v9.5 and v9.7 have added additional features that provide major performance and usability enhancements including:

Partition (local) indexes
Index compression.
REORG individual partition

Use Partition (Local) Indexes

Local indexes are partitioned and only point to rows in a particular table partition. The reasons for using them are:

Faster Set Integrity
Fewer index levels
Default in 9.7 unless it does not meet local index criteria. You can use the use ‘PARTITIONED’ parameter in CREATE INDEX statement
Use the NOT PARTITIONED parameter in CREATE INDEX create a global index
Tables can have mix of global and local indexes

Create Indexes First

Create matching indexes on the new partition (which is actually just a separate table until it is ATTACHed to the partitioned table) prior to issuing the attach command. This makes the SET INGEGRITY command faster because it does not have to build indexes on the new partition. Make sure indexes on new partition (table to be attached) match the existing partitioned table indexes exactly!

RUNSTATS after Attach and Detach

Of course whenever you add or remove lots of data from a table updated statics are necessary! This does not change with these new features

Alter ATTACH vs Alter ADD

To add a new partition to a table you can issue the “ALTER TABLE … ATTACH” command or the “ALTER TABLE … ADD” command. Use ATTACH to add a partition with data and ADD to add and empty partition.

Index Tablespaces

Place index partitions in their own tablespaces. This can make better use of your space, allow partitions to hold more data rows and allows you to have additional bufferpool space for indexes.

BEWARE!!!

Range partitions can be a very useful tool in your tool box, but you must be aware of the design considerations.

Exclusive Table Locks

The ATTACH and DETACH commands require a short exclusive lock of the entire table. This means that all queries running on those tables must be drained and all locks released for these commands to be run. This can be done in just a few seconds, but can impact applications with long running queries.

Run SET INTEGRITY

The attached partition, any Materialized Query Tables (MQT), and any parent or child tables in an enforced foreign key relationship will be unavailable until this command is run for the new partition. Using only local indexes for the table can make this process much faster because the new index entries do not need to be integrated into a global index. To make the SET INTEGRITY process even faster you may also want to limit the number of foreign key relationships and dependent MQTs.

Must Use Table Partition Key in Partition (local) Unique indexes

You can have a unique partition index for a partitioned table, but it must be a superset of your database and table partitioning keys. All non-unique indexes can be partition indexes and do not need the partition key.

Minimize ATTACH processes time

Issue a COMMIT immediately after an ATTACH to release “Z” (exclusive table) locks.
Issue a COMMIT immediately after SET INTEGRITY to make new data is visible and release row locks, thus increasing concurrency with other applications more quickly.
Use SET CURRENT LOCK TIMEOUT in the script to prevent SET INTEGRITY from failing on lock conflict.
Plan to drain queries before issuing the ATTACH. ATTACH won't complete until it drains existing queries for the table and meanwhile, no new queries can start.
Specify ALLOW WRITE ACCESS with SET INTEGRITY if needed. The default is the old, offline behavior. The ALLOW READ ACCESS option is also available.
Specify exception tables during set integrity. You will want to know which, if any, rows are deleted from the new partition.
Consider doing Roll-in and Roll-out together. ATTACH and DETACH in the same transaction minimizes the time that the table is unavailable. Use a different partition name for the attached partition.

Minimize DETACH process time

Prior to detaching a partition identify the dependent MQTs so that you can enable the SET INTEGRITY to do a single tablespace scan by listing the MQT’s at the same time!

Example: SET INTEGRITY for MQT1, MQT2, MQT3 immediate checked

Alternatives

There are alternatives to Range Partitioned Tables that may make more sense in some circumstances.

Multi-Dimensional Clusters (MDC)

With MDCs you can let DB2 set aside large areas within a table where only rows with certain “dimensions” are present. Examples of dimensions are: color, year, model, etc. These are typically relatively low cardinality columns. A unique feature of MDC tables is that the indexes tend to be much smaller, because they do not store the Row Identifier (RID) that addresses every row within a particular set of dimensions. Instead they just point to the sets of extents where the rows with those dimensions live. MDC’s are a good alternative to range partitioned tables when you would have hundreds of partitions to manage, because with MDCs DB2 manages the extents for you.

UNION ALL Views

With a UNION ALL view you create a view of multiple identical tables. It is easy to drop/recreate the view to add or remove tables. With DB2 v9.7 the few remaining advantages of the UNION ALL view over range partitioned tables went away.

Additional Reading

Database Partitioning, Table Partitioning, and MDC for DB2 9

Read the appropriate chapters in this IBM Redbook to get a good understanding of Range (table) Partitioning and MDCs. This book was written in the DB2 9.1 timeframe, so to see the latest features and functions such as local indexes, please read the next link.

Partitioning and Clustering Guide 9.7

This manual describes how to use the latest features that you will need.

Best Practices

Two best practice guides provide information about how to use partitioning and clustering for great performance. They are

Data Life Cycle Management
Physical Data Base Design

DB2 Developerworks

Find lots of great articles on this topic by searching for “Table Partition”

HOME | Search