Running RUNSTATS for DB2/LUW
29 August, 2014
Almost everyone knows that you have to collect statistics regularly for your DB2/LUW database, but there several of options for dong it including the RUNSTATS command, automatic statistics, real-time statistics, statistics profiling and so on. Of all of these options, which ones should you use? These options have been introduced over time, so I thought that an article summarizing the major ways of running your statistics collection would be useful.
The tried and true way to collect statistics in a DB2 database is using a series RUNSTATS Commands in a scrip that is run at some regular interval, say every weekend. I once administered a database that was dynamic and I didn’t want to take a chance on forgetting new tables so I created a set of scripts that generated the RUNSTATS script from the catalog as part of the weekly run and then ran the output. I have an example on db2Dean and Friends. On this site look for a file called runstats.sql in the Files section. There are a bewildering number of parameters that control how much information is collected. A number of these options impact the amount of system resources are used in the collection as well as the amount of information collected for the optimizer. The link to the command above has brief descriptions of all of them. I have not been able to find a single source that gives overall advice on how to construct the command that I like. However, there are many good articles on particular aspects of statistics collection, so Google is your friend here.
Automatic Statistics Collection
For most DBA’s creating new databases or doing a major change to statistics collection, I would recommend using automatic statistics collection. When you enable automatic maintenance and automatic statistics, DB2 will collect the statistics for you. You can specify a window when this will run to avoid collecting statistics during your peak processing time. One big advantage of using automatic maintenance is that DB2 will not collect statistics on tables that have changed little. It will also generally determine the best statistics to collect based on information it has about the table. However if you need to have particular options on RUNSTATS you can use a statistics profile as described below in this article. If you want to allow sampling when using automatic statistics then you will need to configure the auto_sampling Database Configuration parameter as described in the Enabling automatic statists collection Knowledge Center article. For more information on how to enable automatic maintenance, please see my Configuring DB2 Automatic Maintenance using Data Studio article.
Real-time Statistics are collected at query execution time if DB2 determines that the existing statistics are not good enough to optimize the query being executed. If automatic statistics collection is being done regularly or the RUNSTATS utility is being run, then real-time statistics will generally only be done for queries on tables that have a lot of changes happening. Automatic Statistics need to be on for DB2 to execute real-time statistics and the auto_stmt_stats database configuration needs to be on as well. The IBM Knowledge Center automatic statistics collection article includes a good overview of real-time statics.
While statistics collections on tables and their indexes is absolutely critical for query performance, additional statistics can be provided to the optimizer for more complex types of queries using statistical views to improve performance further. The types of queries that can be improved include joins and queries with expressions in the predicate such as “WHERE AMOUNT_RECIEVED < SALE_PRICE – COMMISSION”. To take advantage of this feature you need to create a view that includes the join or expression, enable the view for query optimization and then run RUNSTATS on the view. You do not need to change any of your queries to take advantage of the view statistics. The optimizer will automatically check for the presence of the additional statistics when evaluating queries with the more complex predicates. So if you are having performance problems with a more complex query, creating a statistical view is an easy way to attempt to improve it. You can also use a statistics profile for a statistical view to improve the performance of queries that use expression-based indexes as can be seen in the Expression-based indexes and statistics profiles Knowledge Center article
Statistics profiles are a way to save the RUNSTATS parameters you want to have used whenever statistics are collected on a table. This is especially useful if you want to make sure certain types of statistics are being collected when using automatic statistics and real-time statistics. Actually it is the only way to tell DB2 which RUNSTATS parameters to use when running automatic statistics. You can also specify that you want to use a statistics profile when using the RUNSTATS command itself. You create the profile using the RUNSTATS Command using the SET PROFILE parameter as shown in the Collecting statistics using a statistics profile Knowledge Center article.
Automatic Statistics Profiling
This was a way of having DB2 tell you the best RUJSTATS options for particular table by watching what was going on in the system over time. It has been discontinued in DB/LUW 10.5 so I would not recommend using it.
If your statistics collection is taking too much time or consuming too many system resources, you should consider using the sampling option of RUNSTATS or automatic statistics collection. With sampling, statistics collection uses a sample of data instead of processing all rows in a table. There are two types of sampling: row-level and page-level. Row-level sampling reads all pages in the table, but does not process all of the rows in each table saving considerable processing, but not I/O. Page-level sampling only reads a portion of the pages in the table saving both I/O and CPU. You can read more about statistics sampling at the Collecting statistics on a sample of the data and Data sampling in queries Knowledge Center articles. If you want to allow sampling when using automatic statistics then you will need to configure the auto_sampling Database Configuration parameter as described in the Enabling automatic statists collection Knowledge Center article.
Limit the impact of RUNSTATS
In addition to sampling, there are several other ways to improve the performance of statistics collection. For more information on this please see the Minimizing RUNSTATS impact Knowledge Center article.
While this article describes a number of the major ways to run statistics, I’m sure more could be added. If you have any thoughts on this additional important aspects of statistics collection, please share them on my Facebook Page or the db2Dean and Friends Community.