Thursday, May 15, 2014

How to Improve Database Performance by Measuring User Experience?


What is Response Time Analysis?

Response time analysis is a new approach to application and database performance improvement that allows DBAs and developers to manage their databases guided by the most important criteria - what causes application end-users to wait. Also referred to as wait time analysis, it allows IT teams to align their efforts with service level delivery for IT customers.

The picture represents the Response Time monitoring process. Each SQL query request passes through the database instance. By measuring the time at each step, the total Response Time can be analyzed.



Rather than watching server health statistics and making guesses about their performance impact, wait and response time methods measure the time taken to complete a desired operation. The best implementations break down the time into discrete and individually measurable steps, and identify exactly which steps in which operations cause application delays. Since the database primary mission is to respond with a result, response time is the most important criteria in making database performance decisions.

Response Time = Processing Time + Waiting Time

Response time is defined as the sum of actual processing time and the time as session spends waiting on availability of resources such as a lock, log file or hundreds of other Wait Events or Wait Types. Even when the session has access to the CPU (a CPU Wait Type for example), it is not necessarily being actively processed, since often the CPU is waiting for an I/O or other operation to complete before processing can continue. When multiple sessions compete for the same processing resources, the wait time becomes the most significant component of the actual Response Time.
Wait Events and Wait Types
To accurately measure the Response Time for a database, it is necessary to discretely identify the steps accumulating time. The steps corresponding to physical I/O operations, manipulating buffers, waiting on locks, and all other minute database processes are instrumented by the database vendors. In SQL Server, these steps are called Wait Types. In Oracle, Sybase and DB2, they are referred to as Wait Events. While the specifics are unique for each vendor, the general idea is the same. These Wait Types/Events indicate the amount of time spent while sessions wait for each database resource. If the Wait Types/Events can be accurately monitored and analyzed, the exact bottlenecks and queries causing the delays can be determined.

Differences vs. Conventional Statistics
Typical database performance monitoring tools focus on server health measures and execution ratios. Even with a sophisticated presentation these statistics do not reflect the end-user experience or reveal where the problem originated. Knowing an operation took place millions of times does not inform whether it was actually the cause of an application delay.

Key criteria to distinguish Response Time vs. Conventional analysis methods:

Measure response time for an action to take place, from receipt of request to beginning of response.
Measure each SQL query separately, so the response time effects of a specific SQL can be isolated and evaluated. Measuring total response time across the instance does not give useful information.
Identify the discrete internal steps (Wait Types/Events) that a SQL query takes as it is processed. Treating the instance as a black-box without seeing where the time is consumed internally does not help problem solving.
Practical Considerations for Response Time Analysis
The Response Time approach to performance monitoring is only practical if it can be implemented efficiently in a performance sensitive production environment. Confio uses low-impact agentless technology to meet this requirement. Here are some practical considerations:

  • Low Impact Data CaptureData capturing should not place a burden on your production systems. 
  • Agentless architectures offload processing to a separate system that reduces production database impact to less than 1%.Agentless Database OperationEliminate need to test, install and maintain software on production servers.
  • Passive Monitoring of Production DataMonitor real production sessions, not simulated test transactions.
  • Continuous 7/24 MonitoringInsist on continuous monitoring across all sessions on all servers to ensure any operation can be deeply examined at any time. Occasional trace files will not provide continuous coverage.

No comments: