IBM Performance Expert
Written by Jeff Sullivan
Data Studio Software Specialist / Enabling Architect
20 July 2009
It is Monday morning, and the database systems support staff start arriving at the office. After the usual pleasantries, they settle into their respective cubicles to start their days. The on-call phone is passed to the next on-call person during a brief turnover meeting, followed by a review of what happened over the weekend.
The database group, collectively, received a number of problem tickets and starts the arduous task of researching. The usual questions are asked:
· Is this just an anomaly? A spike in activity which forced the database system into an unusual condition where it was unable to out-perform what happened?
· Is this a new steady-state? Did the applications perform the required due-diligence necessary before they rolled in these changes into production?
· Is this a new system? How did this slip past our change management controls we have in place? Do we have an adequate feedback loop for change?
Digging deeper into their problem ticket research, group members quickly attempt to solve the issues by asking two fundamental questions:
· Why did this happen?
· How can we prevent this from happening again?
This is where IBM DB2 Performance Expertcan help. DB2 Performance Expert can isolate problems faster and with fewer resources, and is used with service level agreements and objectives manifested in exception processing. Ultimately, this frees up time for the production DBA to focus on value-creation activities.
Solving these problems is a critical part to information technology (IT). By definition, we have a problem when a service level objective (SLO) or service level agreement (SLA) is not met. A server failover is not a problem in of itself; rather, it is as a result of a possible problematic event. That event could have been prevented had the system or application been installed with SLOs and SLAs in mind.
An IT shop can handle problems in a couple of ways. It can choose to ignore the problem or perform a quick-fix solution. While choosing to ignore problems is a legitimate business model, it is usually a very expensive one. Quick-fixes (often called workarounds) might get the organization functioning quickly, but frequently do not address the root cause of the issue. These solutions are often repeated and if used as the primary model for troubleshooting, can irreparably hurt the business.
Getting to and answering the final question on preventing the problem from happening again is a challenge. Many shops cannot get to this level of prevention until the "why did this happen" question can be answered. The organizations that are unable to answer this question usually have one or more reasons; singularly or in combination which may attribute to determining the answer:
· Lack of expertise among the staff.
· Lack of collecting the appropriate metrics.
· Lack of the right tools for the job.
There are a variety of reasons which might cause a lack of expertise. Perhaps there may only be a single person that knows the application and knows the intricate details of its usage. Or there could be high turnover in the shop, a small staff doubling up on tasks, or a lack of documentation. Training budgets are tight or possibly nonexistent.
As a result of a thin staff or lack of training, critical collection points and data are not gathered. In many cases, metrics are not gathered due to an inflexible infrastructure model which may not allow certain tooling to be installed. It could also be a perception of earlier days when many tools were not at a maturity level adequate to gather the wide range of data to perform adequate problem triage.
Finally, there could be a lack of having the right tooling to correctly identify causal factors to performing the correct remediation of a problem. Not having the correct metrics to perform problem triage in itself can create problems. Adoption of tools can also be hindered by a "not invented here" mentality. There might be a make-or-buy issue to contend with in determining the best tooling to use in problem determination.
These three issues can be narrowed down and answered with having smarter tooling. Tools which can gather the correct metrics, gather data autonomously, and monitor key performance metrics and indicators all with the service level objectives and service level agreements in mind.
IBM Performance Expert also has a deep level of performance data carried it its Performance Warehouse. Current and current history metrics are provided in key performance indicators (KPIs) and you can get alerting whenever a KPI exceeds the event exception defined for it.
But the heart and soul of IBM Performance Expert lies in the Performance Warehouse where data can be kept for as little as a few days up to months. This data is granular enough to show individual SQL statements and activity, but can also be analyzed with one of 3 possible ways:
· The Performance Warehouse Report and Load allows you to define, schedule, and run processes that automate the creation of reports, the loading of these reports into the Performance Warehouse data tables.
· The Performance Warehouse Analysis is used to analyze and evaluate the performance data using rules of thumb and queries.
· The Performance Warehouse Expert is used to perform all tasks available in Performance Warehouse in addition to your own specific queries.
All 3 of these Performance Warehouse options come with a plethora of predefined reports to analyze collected data from all of your DB2 for LUW instances. The reports generated are in HTML format and can easily be placed on a reporting server for all of your IT staff to see. Why continually answer the same capacity and performance questions over and over when you can have these same metrics available for anyone? With Performance Expert's Performance Warehouse this task is greatly simplified and can be automated.
But suppose you have a unique situation. I was recently asked to help a customer report on highly used SQL where the fundamental "base" SQL does not change but the predicate values do change. Further, they wanted to know what the aggregate timings are for this SQL regardless of the predicate values. This is EASY to do against the Performance Warehouse since all of the data is already collected; all it requires is a specialty SQL.
This specialty SQL only extracts SELECT statements and strips off everything past the WHERE clause, putting the data in temp table LAST. The second part is it reads the temp table with distinct to allow summation of the collection columns.
as (SELECT substr(stmt_text,1,posstr(stmt_text, ' WHERE')),
WHERE stmt_text like 'SELECT%')
SELECT distinct sqltxt,
sum(distinct numexec) as Nbr_of_Exec,
sum(distinct rowsread) as Total_Rows_Read,
sum(distinct totexectime) as Total_Exec_Time,
sum(distinct totsyscpu) as Total_System_CPU,
sum(distinct totusrcpu) as Total_User_CPU
where sqltxt <> ' '
group by sqltxt;
The output shows the following:
In summary, DB2 Performance Expert with the Performance Warehouse will make your life so much easier as a DBA. Problem research is a snap and deep dive research is made easy. But the one thing that I really like about DB2 Performance Expert is the low overhead to installation. There is no agent required on the monitored server - nothing, zilch, nada. One of my big push button issues was when one of these performance tools had a fixpack upgrade where there was an agent that had to be updated on all the servers. Ever try and get an outage on a server to do a fixpack agent upgrade that has to be available 24 by 7 ? Not pretty!
THAT is why I like DB2 Performance Expert. Feel free to Email me (firstname.lastname@example.org) or DB2Dean (email@example.com) for more information about DB2 Performance Expert or any of the integrated data management solutions from Optim.
For more information, check out: