Essential DB2 Health Check
20 May 2009
Everyone wants their systems to be highly available and to perform well on an ongoing basis, but verifying the health of their system can seem like a daunting task. Good News! The things that matter the most and help avoid the most frequent problems are the easiest to implement! They are also the foundation of other in depth and focused areas of system verifications. In this article I’ll discuss a number of things that you can do periodically to ensure that you have a well running DB2 system. They are so easy that even db2Dean can figure them out. When these are done regularly, they can prevent many outages and are very cheap. You can even put them into a script and run them once a month to send you a report.
Up-to-Date Fix Packs
It is a good idea to keep your software maintenance at a relatively recent level. I do not necessarily suggest that you apply fix packs the moment that they are released, but I would recommend not letting your fix pack levels get more than a year old. By keeping your maintenance reasonably up to date, you avoid finding problems that have already been detected and fixed. On the other hand I don’t like to apply new fix packs until they have been out for a month or two because they can sometimes introduce, shall we say, “less desirable” software features that will be withdrawn as soon as someone reports them. It is also a good idea to periodically check the fix pack page to see if newer ones have fixed HIPER APARS. The “Fix Pack Information” box on the fix pack page will show if HIPER APARS have been fixed by a given fix pack. HIPER APARS are the worst defects and it is a good idea to determine if the HIPER APARS could affect you and apply the newer fix pack if they do. You can see my Favorite Links page for links to the Fix Pack page for each DB2 Version. It is also a good idea to ensure that you are running a supported version of DB2 so that you can get help if you do have a problem. My Favorite Links page also shows the supported versions under the End of Support by Version link.
Correct OS Version
All DB2 versions have a page that is kept up to date by the development team that shows the Operating System (OS) and minimum service pack levels along with supported hardware versions. If you do manage to get DB2 installed on a system that is below the minimum recommendations you are likely to run into some very strange and hard to diagnose results that can occur much later. It is a good idea to check this site regularly in case problems are discovered that are fixed with updated OS APARS or service packs. I have a link on my Favorite Links page called Operating System Levels For DB2/LUW that you can check to verify that you meet the minimum system levels. Please note that DB2 is only tested on the major releases shown such as AIX 5.3 or Windows 2003 and should not be run on major releases not shown. However, the service pack, APAR, TL, etc levels shown are minimums and running newer OS maintenance levels is encouraged. This page has one quirk that you should notice. After you choose your DB2 version from the drop down list you must click the “Go” button to refresh the links for that version. If you don’t the links will take you to the page for the version shown when you landed on the page.
File System Free Space
For various file systems used for System Managed Storage (SMS) tablespaces and other DB2 objects you should verify that you have enough free space for growth. One of the most frequent causes of problems for any database is running out of space. It is even better if you have a system monitoring tool that will alert someone when space is running low in a file system. For SMS tablespaces, I recommend that file system space be assessed, because they can grow as needed as long as there is space in the file system. There are other DB2 objects that grow, so all file systems related to DB2 should be monitored. You should be actively pruning database backup files and archive logs as they age so as not to run out of room on those file systems. DB2 9.5 can prune backups, log files and other recovery objects if you set the appropriate Database Configuration parameters including num_db_backups, rec_his_retentn, and auto_del_rec_obj. In older versions you will need to write scripts to prune these files. You should also monitor the files in the “db2dump” directory. Old trace and dump files should be deleted when they are no longer needed. Also you should monitor the size of the “db2diag.log” file and delete it, rename and compress it, or copy it somewhere else periodically. Please note that if you delete or rename it, DB2 will create a new one the next time that it needs to make an entry.
DMS Tablespace Free Space
If you are using Database Managed Space (DMS) tablespaces then you will want to ensure that you always have free space for growth those tablespaces. If you have enabled “AUTORESIZE” and have not specified a “MAXSIZE” for all of your DMS tablespaces, then you can just monitor the free space in your file systems like you do for the SMS tablespaces. However, for any tablespaces that can’t resize themselves to fill the entire file system, then you need to monitor the individual tablespaces to ensure that they do not fill up. You can get all of the information you need about your tablespaces from the TBSP_UTILIZATION view in DB2 v9.1 and higher. Here is an example of a query that you can run to get the free space and resize information that you need for DMS tablespaces:
select substr(tbsp_name,1,10) "Name", tbsp_utilization_percent "Used%", tbsp_free_pages "FreePages",
tbsp_usable_Pages "UsablePages", tbsp_used_pages "UsedPages", tbsp_page_top "HWM",
tbsp_auto_resize_Enabled "AutoResize 1=Yes", tbsp_max_size "MaxSizeBytes -1=ulimited",
tbsp_last_resize_failed "ResizeFailed 1=Yes",
tbsp_page_size "PgSize Bytes"
where tbsp_type = 'DMS';
This query works best from the Command Center or when executed from a file, i.e. “db2 –tvf tbsp_query.sql”. With this query you can easily see if AUTORESIZE is enabled (tbsp_auto_resize_enabled = “1”) and if there is a MAXSIZE for all DMS tablespaces. Tbsp_max_size = “-1” (negative 1) indicates that the tablespace can grow until the file system runs out of room if AUTORESIZE is enabled. If the tablespace size is limited then you would want to review the free space and high water mark (HWM). You can determine the free space in pages by examining the “tbsp_free_pages” value for each tablespace. To calculate space in bytes, you can multiply this value by the “tbsp_page_size” value. You should also note the tbsp_page_top (High Water Mark or HWM) value compared to the “tbsp_useable_pages” value (the number of usable pages in the tablespace). The LOAD utility always starts loading pages at the HWM page. Even if there are lots of “FreePgs” but the HWM is near the useable pages value then your loads may start to fail. Finally you should check to see if the registry variable “DB2_SET_MAX_CONTAINER_SIZE” has been set. If it has been set to some value, then no container can grow larger than that size and that can limit the size of your tablespace.
You can also get all of this information by issuing the command the db2pd command, for example, “db2pd –db <db_name> -tablespaces”.
Backup and Logging
The old saying in the database world that you are only as good as your last backup should never be forgotten. You should verify that you are backing up your system regularly and that you can access the backup files and archived log files that you need. You should also verify periodically that the type and frequency of backups are meeting your business needs. For example, if you are doing off-line backup every Saturday night and not archiving logs, could your business stand to loose a whole week of data if the database was somehow corrupted on Friday evening? If you are doing on-line backups are you managing your log files so that you can recover your data? You should note that an on-line backup is of no value if you do not have the transaction log files that it needs. That’s right! Even if you have a good on-line backup, but just one log file that it needs is missing, you will NOT be able to get your database back to a useable state. That is why I always recommend using the “INCLUDE LOGS” parameter on the backup command so that you can always at least recover to the time of my last good backup, because the backup file will contain all of the logs that it needs for a useable restore.
But db2Dean, I don’t do backups because my data can be recreated from other sources! OK Sparky, I’ll bet that you can’t recreate all of your database structures including tables, indexes, tablespaces, MQTs, containers etc, because even if you can find the 5 year old DDL used to create your database you will not be able to find all of the DDL for every structure change since then and apply them in the correct order. Think about how well your load processes will work if your database structure is not quite right. The good news is that you can use the db2look command to extract all of the DDL into a file that can be used to exactly recreate your database structure. For databases whose data can be easily recreated I recommend that you set up a regularly executed process to export all DDL with the db2look command.
Some times it is adequate to be able to restore your database to the point in time of your last backup, but if it is not, then you need to switch your database to archive logging mode. This allows your system to keep copies of logs that can be applied to a backup to bring the database back up to date. You should review your business needs to see if this is required. This is a necessary operation for many databases, but adds a dimension of complexity for managing those files. This complexity is compounded if you use more exotic backup options like incremental and tablespace-only backups. I personally subscribe to the KISS method, because the last thing that I need in a 3:00 am pressure situation is more objects to apply in just the right order. Everyone who administers databases should be aware of the backup strategies and tools and that is why I recommend reviewing at least the first few chapters of the Data Recovery and High Availability Guide and Reference. You should periodically ensure that you are using the correct backup processes that meet your business needs while minimizing complexity where possible.
One of the most important things for query performance and most overlooked is having up-to-date statistics for tables and indexes. DB2 uses the statistics to determine the best access path to the data that you request. If you do not have good statics then DB2 may not be able to pick the best access to your data. Since data is not static, you need to update statistics regularly. You do this with the RUNSTATS command for each table. You should use it to collect statistics for the table and all associated indexes. To determine if your statistics are current, you can query the SYSCAT.TABLES and SYSCAT.INDEXES tables as follows:
select stats_time, rtrim(tabname) Table
where type = 'T'
order by stats_time desc;
select stats_time, rtrim(tabname) Table, rtrim(indname) Index
order by stats_time desc;
These queries show the objects with the oldest statistics first. A null value for “STATS_TIME” means that statistics have never been collected. If you are running any static SQL (you have bound packages) you may want to also periodically rebind those static packages to ensure that they are using the best access paths for the current statics. You can use either the REBIND clp command for individual packages or the “db2rbind” system command to rebind all packages at once.
DBM and DB Parameters
It is a good idea to occasionally see if your instance (DBM or Database Manager) and DB (database) configuration parameters are in the ball park to ensure good performance. Even if you are using Self Tuning Memory Manager it is a good idea to verify the parameters as there are several items that are not controlled by STMM. You can easily check your settings against the recommended settings by running the Configuration Advisor. You can use it to show you where it recommends settings different than what you have. Settings with large differences should be investigated. You can run the Configuration from the Control Center or from the command line as shown in this example:
Of course you would want to make sure the inputs such as isolation level are correct for your database. You can see the full description in this link: Configuration Advisor.
Once you ensure that your instance and database configurations are good and that you have up to date statistics, you may want to see if you have all of the indexes that you need for your workload. The easiest way to do this is to is to use the Design Advisor and indicate that you wish to have it extract the workload from the package cache. Using the Design advisor in this way will recommend indexes that can make your system run better. It will actually generate the “CREATE INDEX” commands for the indexes that it recommends. Implementing these recommendations can make big improvements in the performance of your system. This can even help purchased applications because no changes are needed to the applications for the database to begin using the indexes. When you create a new index, don’t forget to update the statistics for the table. The Design Advisor can be used from the Control Center and from the command line. Here is a command line example:
db2advis -d <db_name> -g –m I -o advisfile.txt
If you get an error about explain tables not being set up properly, then create the explain tables under the same user executing db2advise according the Creating Explain Tables instructions. Design Advisor can recommend other objects that can improve performance and can use other work loads that you specify. You can read more about this powerful command in chapter 17 of Tuning Database Performance manual. The package cache is a cache where DB2 keeps SQL statements that it has recently executed. My example above assumes a package cache hit ratio of nearly 100%. This statistic represents the percentage of time that an SQL statement is already in the cache when it is executed. A hit ratio of nearly 100% means that the package cache is very representative of your workload. The example below shows how to calculate your package cache hit ratio in a v9.x database. For older versions you can perform a database snapshot and use a similar calculation to find it:
select rtrim(db_name) Database,
rtrim(char((1-(pkg_cache_inserts/pkg_cache_lookups))*100)) || '%' PackageCacheHitRatio
If you would like information about more in depth performance and tuning then please see our best practice paper, Tuning and Monitoring Database System Performance.
My article is by no means a complete list of tasks that you can do to check the health of your DB2 systems, but does cover many of the important aspects and does not require the purchase of other tools. In this article I attempted to show ways to examine your system that can provide great benefit, are easy and do not take a lot of time. These items should always be done prior to implementing more complex availability features such as clustering because they provide a foundation for highly available systems. If you have other things that you check in your systems, please send them to me and I will update this list. Most of my articles are works in progress anyway.