Using DB2 Table (Range) Partitioning
31 October 2009
You have heard a lot about DB2’s Range Partition feature
that allows data in a single table to be placed in multiple tablespaces and that
allows more scalability, elimination of unneeded partitions for individual
queries, and the ability to quickly roll-in and roll-out partitions of data. There
are plenty of resources (links are at the bottom) that describe what it is and
tell you how to implement Range Partitions, but in this article, I’ll explore
when you want to use Table Partitions and provide tips on getting the most out
of them. It should be noted that almost all of the information in this
article comes from the excellent presentation given by Christopher Tsounis, an
IBM Executive IT Specialist at the 2009 North American IDUG Conference.
Briefly, Range Partitioning (or table partitioning) allows you to create tables where the data is placed in multiple partitions according to data ranges that you specify. The range partitions can be placed into the same tablespaces or different tablespaces. The most common way to range partition is by date, but any data type or calculated columns can be used. An example is having rows with a date column and putting rows of different months in different partitions. The main advantages of a range partitioned table are:
Improved Query Performance
The DB2 optimizer is aware of the partitions and if your query needs to scan many rows to get your result set and your predicate (where clause) uses ranges that you defined, then only partitions that have rows that satisfy your query will be searched instead searching the entire table. This is called “partition elimination” and can greatly reduce query elapsed times.
Allows for optimized roll-in / roll-out processing of Ranges
New partitions can very quickly be attached and old partitions can quickly be detached as needed. In this process you load or insert data into a new partition before making it part of the partitioned table. This load process has no more effect on the use of our subject table than the load of any other independent table in the database. Once you finish loading the new partition you use the ATTACH command to integrate the new partition into the primary Range Partition Table. This process is done in one unit of work, eliminating the possibility that a partially inserted set of data will skew query results or lock other queries out while the load is being performed. However, you do need to drain all queries for the attach process, but it is very fast.
Optimize the management of very large tables
For very large tables the time needed to do certain operations like backup can get out of hand. However if you partition a table, you can perform many of these operations on individual partitions, staying within your batch window and limiting the impact of those utilities to individual partitions. You need to put the different partitions into different tablespaces to allow backup of single partitions. This also allows you to have tables of unlimited size with no software limitations.
Greater Index Placement Flexibility
For non-partitioned System Managed Storage (SMS) tablespaces all of your indexes must be in the same tablespace as your table, but SMS range-partitioned tables allow for you to place indexes in different tablespaces. This also allows SMS tables to grow beyond the size limit of a Regular Tablespace.
Get the most out of your Range Partitioned TABLES
Here are several things that you can do to optimize the use of your partitioned tables.
Use DB2 v9.7
If you are going to start using Range Partitioning, upgrade to DB2 v9.7. Range Partitioning was introduced in DB2 v9.1 and most of the advantages of this feature are there in that version. However, v9.5 and v9.7 have added additional features that provide major performance and usability enhancements including:
Use Partition (Local) Indexes
Local indexes are partitioned and only point to rows in a particular table partition. The reasons for using them are:
Create Indexes First
Create matching indexes on the new partition (which is actually just a separate table until it is ATTACHed to the partitioned table) prior to issuing the attach command. This makes the SET INGEGRITY command faster because it does not have to build indexes on the new partition. Make sure indexes on new partition (table to be attached) match the existing partitioned table indexes exactly!
RUNSTATS after Attach and Detach
Of course whenever you add or remove lots of data from a table updated statics are necessary! This does not change with these new features
Alter ATTACH vs Alter ADD
To add a new partition to a table you can issue the “ALTER TABLE … ATTACH” command or the “ALTER TABLE … ADD” command. Use ATTACH to add a partition with data and ADD to add and empty partition.
Place index partitions in their own tablespaces. This can make better use of your space, allow partitions to hold more data rows and allows you to have additional bufferpool space for indexes.
Range partitions can be a very useful tool in your tool box, but you must be aware of the design considerations.
Exclusive Table Locks
The ATTACH and DETACH commands require a short exclusive lock of the entire table. This means that all queries running on those tables must be drained and all locks released for these commands to be run. This can be done in just a few seconds, but can impact applications with long running queries.
Run SET INTEGRITY
The attached partition, any Materialized Query Tables (MQT), and any parent or child tables in an enforced foreign key relationship will be unavailable until this command is run for the new partition. Using only local indexes for the table can make this process much faster because the new index entries do not need to be integrated into a global index. To make the SET INTEGRITY process even faster you may also want to limit the number of foreign key relationships and dependent MQTs.
Must Use Table Partition Key in Partition (local) Unique indexes
You can have a unique partition index for a partitioned table, but it must be a superset of your database and table partitioning keys. All non-unique indexes can be partition indexes and do not need the partition key.
Minimize ATTACH processes time
Minimize DETACH process time
There are alternatives to Range Partitioned Tables that may make more sense in some circumstances.
Multi-Dimensional Clusters (MDC)
With MDCs you can let DB2 set aside large areas within a table where only rows with certain “dimensions” are present. Examples of dimensions are: color, year, model, etc. These are typically relatively low cardinality columns. A unique feature of MDC tables is that the indexes tend to be much smaller, because they do not store the Row Identifier (RID) that addresses every row within a particular set of dimensions. Instead they just point to the sets of extents where the rows with those dimensions live. MDC’s are a good alternative to range partitioned tables when you would have hundreds of partitions to manage, because with MDCs DB2 manages the extents for you.
UNION ALL Views
With a UNION ALL view you create a view of multiple identical tables. It is easy to drop/recreate the view to add or remove tables. With DB2 v9.7 the few remaining advantages of the UNION ALL view over range partitioned tables went away.
Read the appropriate chapters in this IBM Redbook to get a good understanding of Range (table) Partitioning and MDCs. This book was written in the DB2 9.1 timeframe, so to see the latest features and functions such as local indexes, please read the next link.
This manual describes how to use the latest features that you will need.
Two best practice guides provide information about how to use partitioning and clustering for great performance. They are
Find lots of great articles on this topic by searching for “Table Partition”