As discussed in the Load Test Environments article, performance and load test environments tend to be smaller versions of the production environment. These differences can include:
Less memory
Fewer and smaller physical CPU’s
Fewer and less efficient disk arrays
Single or fewer instances of servers, e.g. 2 database servers in performance test, but 3 database servers in production or even no clustering at all
There can be other differences as well. System software & hardware versions and specifications can vary. Data can be older and less comprehensive than in production. There also may be different authentication, firewall or load balancing configurations in place. Data volumes may be less, the O/S may be 32bit instead of 64bit or vice versa.
The performance test itself is full of compromise. For instance:
Normally only a subset of functions are automated. The path through the function may vary slightly, but generally the same steps are repeated each iteration
Data can be mass produced in a uniform manner, affecting the way data is stored and accessed on the database. Some database tables can contain too little data, others too much
The behaviour of users can be unrealistic, for instance, if a requisition is raised, it is not generally fulfilled in reality until some days down the track. In a performance test, it may be 5 minutes later.
Workload being processed varies in the normal course of events, during a performance test it can remain uniform
This can create many problems for the performance tester. How accurate is the performance test? When the application goes into production and falls over in a heap, will I look incompentent? Understanding the differences between any performance test environment and the production environment is essential. This can help detect and understand an artificial bottleneck. This is something quite different from a performance bottleneck.
An artificial bottleneck is essentially a performance problem that is a direct result of a difference between the production environment or workload and the performance test environment or workload. It is not a performance bottleneck. A performance bottleneck is something that could or is happening in production.
When a performance bottleneck is found, the performance tester must investigate the symptoms in an attempt to try and pinpoint the cause of the issue. Care must be taken to distinguish between a genuine performance bottleneck and an artificial bottleneck brought about purely because of differences in performance test compared to production.
Examples of artificial bottlenecks include:
Database locking - Performance testing results in a subset of functionality being automated. The entire workload is then spread across this subset of automation resulting in some functions been driven at a higher frequency during performance test than would be seen in production. This can result in database locking which would not occur in production. The solution? The problem function should be driven no higher than the maximum peak workload expected in production.
Poor response times and excessive database physical or logical I/O - If a large amount of data has been recently added to the database via a data creation exercise, database tables can become disorganised, resulting in inefficient use of indexes. Additionally, if performance testing started with the database tables nearly empty of data, the optimiser can incorrectly decide that index use is not required. The solution? Run a database reorganisation and refresh optimiser statistics at regular intervals if large amounts of data are being added to the database.
Poor response times and excessive database physical I/O - The assumption here is that the database bufferpool is smaller in performance test than it is in production due to a shortage of memory in the performance test environment. Unfortunately, this is a case of poor performance of some users impacting the performance of all users. The solution? Once the problem function (or functions) is identified, attempt to decrease the workload for those functions to minimise physical I/O. If this is not possible, it may be time for a memory upgrade of the server. Memory upgrades are usually relatively straightforward in terms cost and time.
Memory leak - This is a terrible term describing how memory is gradually consumed over time inferring that at some point, there will be no free physical memory available on the server. Performance test environments often differ from production environments in terms of house keeping. In production, the application could for example be refreshed each night, in performance test it may have been nine weeks since it was last refreshed. The amount of memory in production is often substantially more than in performance test. The solution? Base memory leak calculations on the production memory availability with reference to the production house keeping regime. Remember that once physical memory is fully consumed, virtual memory is still available so it’s not the end of the world as we know it.
Last but not least, when a genuine problem is found, the intial reaction from the developers, the DBA's, the architects, the project management, pretty much everyone really is that the problem observed is not a real problem, it is an artifact of the test tool. There seems to be a disconnect from reality, a lack of understanding that the behaviour of the application changes depending on the workload. Every performance tester out there should know what I am talking about. In fact, please quote this article when discussing these issues with the project. It is down to the performance tester to come up with some approach that will satisfy everyone concerned that the problem being obeserved is or is not due to the automated test tool, i.e. is not an artificial bottleneck. This is a scientific approach, develop a theory, then design a test that should prove or disprove that theory. It does not really matter if the theory is right or wrong, everytime you devise a new test and execute it, you learn something new about the behaviour of the application. The performance tester has to work hard to gain the trust of the project and through thorough performance analysis, demonstrate that a performance problem is just that, a performance bottleneck that will cause an impact in production.
An example of a test tool generated performance issue:
Ramping up the workload too quickly. If virtual users start too quickly, the application struggles to open connections and server requests. For most applications, this is abnormal behaviour that you would not normally observe in a production environment.
Configuring the workload in such a way that it is spread unevenly across the period of an hour, for instance users with just 2 iterations to execute waiting for excessive amount of time between iterations causing a quiet patch in the middle of a Load test.
Restarting the application just before performance testing starts. While this can help to maintain consistency of results, throwing a large number of users at an application where data cache is not populated and software and hardware connections are not established can cause an unrealistic spike in performance as well as much longer than expected response times. In reality, application starts normally happen around 05:00 in the morning when few users are around. The first few users of the day will experience longer responses but these will be limited. By the time a much larger workload is starting to ramp up (such as in a performance test), data cache, threads and connections are already open and available and will not cause users these long response times.
Putting a think time inside the start and end transaction markers denoting response time so that the recorded response time for a transaction is much longer than it actually
Less memory
Fewer and smaller physical CPU’s
Fewer and less efficient disk arrays
Single or fewer instances of servers, e.g. 2 database servers in performance test, but 3 database servers in production or even no clustering at all
There can be other differences as well. System software & hardware versions and specifications can vary. Data can be older and less comprehensive than in production. There also may be different authentication, firewall or load balancing configurations in place. Data volumes may be less, the O/S may be 32bit instead of 64bit or vice versa.
The performance test itself is full of compromise. For instance:
Normally only a subset of functions are automated. The path through the function may vary slightly, but generally the same steps are repeated each iteration
Data can be mass produced in a uniform manner, affecting the way data is stored and accessed on the database. Some database tables can contain too little data, others too much
The behaviour of users can be unrealistic, for instance, if a requisition is raised, it is not generally fulfilled in reality until some days down the track. In a performance test, it may be 5 minutes later.
Workload being processed varies in the normal course of events, during a performance test it can remain uniform
This can create many problems for the performance tester. How accurate is the performance test? When the application goes into production and falls over in a heap, will I look incompentent? Understanding the differences between any performance test environment and the production environment is essential. This can help detect and understand an artificial bottleneck. This is something quite different from a performance bottleneck.
An artificial bottleneck is essentially a performance problem that is a direct result of a difference between the production environment or workload and the performance test environment or workload. It is not a performance bottleneck. A performance bottleneck is something that could or is happening in production.
When a performance bottleneck is found, the performance tester must investigate the symptoms in an attempt to try and pinpoint the cause of the issue. Care must be taken to distinguish between a genuine performance bottleneck and an artificial bottleneck brought about purely because of differences in performance test compared to production.
Examples of artificial bottlenecks include:
Database locking - Performance testing results in a subset of functionality being automated. The entire workload is then spread across this subset of automation resulting in some functions been driven at a higher frequency during performance test than would be seen in production. This can result in database locking which would not occur in production. The solution? The problem function should be driven no higher than the maximum peak workload expected in production.
Poor response times and excessive database physical or logical I/O - If a large amount of data has been recently added to the database via a data creation exercise, database tables can become disorganised, resulting in inefficient use of indexes. Additionally, if performance testing started with the database tables nearly empty of data, the optimiser can incorrectly decide that index use is not required. The solution? Run a database reorganisation and refresh optimiser statistics at regular intervals if large amounts of data are being added to the database.
Poor response times and excessive database physical I/O - The assumption here is that the database bufferpool is smaller in performance test than it is in production due to a shortage of memory in the performance test environment. Unfortunately, this is a case of poor performance of some users impacting the performance of all users. The solution? Once the problem function (or functions) is identified, attempt to decrease the workload for those functions to minimise physical I/O. If this is not possible, it may be time for a memory upgrade of the server. Memory upgrades are usually relatively straightforward in terms cost and time.
Memory leak - This is a terrible term describing how memory is gradually consumed over time inferring that at some point, there will be no free physical memory available on the server. Performance test environments often differ from production environments in terms of house keeping. In production, the application could for example be refreshed each night, in performance test it may have been nine weeks since it was last refreshed. The amount of memory in production is often substantially more than in performance test. The solution? Base memory leak calculations on the production memory availability with reference to the production house keeping regime. Remember that once physical memory is fully consumed, virtual memory is still available so it’s not the end of the world as we know it.
Last but not least, when a genuine problem is found, the intial reaction from the developers, the DBA's, the architects, the project management, pretty much everyone really is that the problem observed is not a real problem, it is an artifact of the test tool. There seems to be a disconnect from reality, a lack of understanding that the behaviour of the application changes depending on the workload. Every performance tester out there should know what I am talking about. In fact, please quote this article when discussing these issues with the project. It is down to the performance tester to come up with some approach that will satisfy everyone concerned that the problem being obeserved is or is not due to the automated test tool, i.e. is not an artificial bottleneck. This is a scientific approach, develop a theory, then design a test that should prove or disprove that theory. It does not really matter if the theory is right or wrong, everytime you devise a new test and execute it, you learn something new about the behaviour of the application. The performance tester has to work hard to gain the trust of the project and through thorough performance analysis, demonstrate that a performance problem is just that, a performance bottleneck that will cause an impact in production.
An example of a test tool generated performance issue:
Ramping up the workload too quickly. If virtual users start too quickly, the application struggles to open connections and server requests. For most applications, this is abnormal behaviour that you would not normally observe in a production environment.
Configuring the workload in such a way that it is spread unevenly across the period of an hour, for instance users with just 2 iterations to execute waiting for excessive amount of time between iterations causing a quiet patch in the middle of a Load test.
Restarting the application just before performance testing starts. While this can help to maintain consistency of results, throwing a large number of users at an application where data cache is not populated and software and hardware connections are not established can cause an unrealistic spike in performance as well as much longer than expected response times. In reality, application starts normally happen around 05:00 in the morning when few users are around. The first few users of the day will experience longer responses but these will be limited. By the time a much larger workload is starting to ramp up (such as in a performance test), data cache, threads and connections are already open and available and will not cause users these long response times.
Putting a think time inside the start and end transaction markers denoting response time so that the recorded response time for a transaction is much longer than it actually
No comments:
Post a Comment