Performance Engineering : 05/15/14

Thursday, May 15, 2014

How does collating the test results work in loadrunner controller?

At the end of a test the results are collated by the LoadRunner controller. Each of the generators results are collected in a .eve (Event) file and the output messages for the controller are collected in a .mdb (Microsot Access) database. This happens in the directory specified for the results on the controller. A .lrr (loadrunner
results) file is created. The .lrr file is text. The .eve file was text prior to around LoadRunner 7.5, but since then it has been an unpublished compressed format.

When you start the analysis utility it take the information in the .lrr file and the .eve file and creates a .mdb (microsoft access database) or SQL Server database entries which contain each timing record and data
point entries. If collation fails at the end of the test you will have only partial results for analysis.

How to Improve Database Performance by Measuring User Experience?

What is Response Time Analysis?

Response time analysis is a new approach to application and database performance improvement that allows DBAs and developers to manage their databases guided by the most important criteria - what causes application end-users to wait. Also referred to as wait time analysis, it allows IT teams to align their efforts with service level delivery for IT customers.

The picture represents the Response Time monitoring process. Each SQL query request passes through the database instance. By measuring the time at each step, the total Response Time can be analyzed.

Rather than watching server health statistics and making guesses about their performance impact, wait and response time methods measure the time taken to complete a desired operation. The best implementations break down the time into discrete and individually measurable steps, and identify exactly which steps in which operations cause application delays. Since the database primary mission is to respond with a result, response time is the most important criteria in making database performance decisions.

Response Time = Processing Time + Waiting Time

Response time is defined as the sum of actual processing time and the time as session spends waiting on availability of resources such as a lock, log file or hundreds of other Wait Events or Wait Types. Even when the session has access to the CPU (a CPU Wait Type for example), it is not necessarily being actively processed, since often the CPU is waiting for an I/O or other operation to complete before processing can continue. When multiple sessions compete for the same processing resources, the wait time becomes the most significant component of the actual Response Time.

Wait Events and Wait Types
To accurately measure the Response Time for a database, it is necessary to discretely identify the steps accumulating time. The steps corresponding to physical I/O operations, manipulating buffers, waiting on locks, and all other minute database processes are instrumented by the database vendors. In SQL Server, these steps are called Wait Types. In Oracle, Sybase and DB2, they are referred to as Wait Events. While the specifics are unique for each vendor, the general idea is the same. These Wait Types/Events indicate the amount of time spent while sessions wait for each database resource. If the Wait Types/Events can be accurately monitored and analyzed, the exact bottlenecks and queries causing the delays can be determined.

Differences vs. Conventional Statistics
Typical database performance monitoring tools focus on server health measures and execution ratios. Even with a sophisticated presentation these statistics do not reflect the end-user experience or reveal where the problem originated. Knowing an operation took place millions of times does not inform whether it was actually the cause of an application delay.

Key criteria to distinguish Response Time vs. Conventional analysis methods:

Measure response time for an action to take place, from receipt of request to beginning of response.
Measure each SQL query separately, so the response time effects of a specific SQL can be isolated and evaluated. Measuring total response time across the instance does not give useful information.
Identify the discrete internal steps (Wait Types/Events) that a SQL query takes as it is processed. Treating the instance as a black-box without seeing where the time is consumed internally does not help problem solving.

Practical Considerations for Response Time Analysis
The Response Time approach to performance monitoring is only practical if it can be implemented efficiently in a performance sensitive production environment. Confio uses low-impact agentless technology to meet this requirement. Here are some practical considerations:

Low Impact Data CaptureData capturing should not place a burden on your production systems.
Agentless architectures offload processing to a separate system that reduces production database impact to less than 1%.Agentless Database OperationEliminate need to test, install and maintain software on production servers.
Passive Monitoring of Production DataMonitor real production sessions, not simulated test transactions.
Continuous 7/24 MonitoringInsist on continuous monitoring across all sessions on all servers to ensure any operation can be deeply examined at any time. Occasional trace files will not provide continuous coverage.

Do You Know If Your Database Is Slow?

The time to respond:

There was a question at Pythian a while ago on how to monitor Oracle database instance performanceand alert if there is significant degradation. That got me thinking, while there are different approaches that different DBAs would take to interactively measure current instance performance, here we would need something simple. It would need to give a decisive answer and be able to say that “current performance is not acceptable” or “current performance is within normal (expected) limits”.

Going to the basics of how database performance can be described, we can simply say that database performance is either the response time of the operations the end-user do and/or the amount of work the database instance does in a certain time period – throughput.

We can easily find these metrics in from the v$sysmetric dynamic view:

SQL> select to_char(begin_time,'hh24:mi') time, round( value * 10, 2) "Response Time (ms)"
from v$sysmetric
where metric_name='SQL Service Response Time'

TIME Response Time (ms)
--------------- ------------------
07:20 .32

So this is the last-minute response time for user calls (here in ms). We can check the throughput by checking the amount of logical blocks (it includes the physical blocks) being read, plus we can add direct reads (last minute and last several seconds output here for a database with 8 KB block):

SQL> select a.begin_time, a.end_time, round(((a.value + b.value)/131072),2) "GB per sec"

from v$sysmetric a, v$sysmetric b

where a.metric_name = 'Logical Reads Per Sec'

and b.metric_name = 'Physical Reads Direct Per Sec'

and a.begin_time = b.begin_time

BEGIN_TIME END_TIME GB per sec

-------------------- -------------------- ----------

16-jun-2013 08:51:36 16-jun-2013 08:52:37 .01

16-jun-2013 08:52:22 16-jun-2013 08:52:37 .01

We can check more historical values through v$sysmetric_summary, v$sysmetric_history and dba_hist_ssysmetric_summary.

So did these queries answer the basic question “Do we have bad performance?”? 100 MB/sec throughput and 0.32 ms for a user call? We have seen better performance, but is it bad enough that we should alert the on-call DBA to investigate in more detail and look for the reason why we are seeing this kind of values? We cannot say. We need something to compare these values to so that we can determine if they are too low or too high. It is somewhat like being in a train that passes next to another moving train, going in same direction but at a different speed. We don’t know the speed of our train, and we don’t know the speed of the other train, so we cannot answer the question “Are we going very fast?”. If we turn to the other side and see a tree passing on the other side of the train, we will be able to estimate the speed of the train (also taking into account our experience of what is very fast for a train…). So we need something that has an absolute value. In the case of the tree, we know that the tree has speed of 0