Performance Engineering : October 2013

Friday, October 18, 2013

How to run RDA(Remote Diagnostic Agent) on Oracle Application Sever 10g

Run the RDA like this:
RDA is not executed with the correct connect string format and -p Pda.It is not collected all the portal-related information .
For Unix[ $HOME/myDir/rda/ ]$ ./rda.sh -vdSCRP -p Pda
For WindowsD:\myDir\rda\rda.cmd -vdSCRP -p Pda
- Please note the -p Pda is *case-sensitive*. Using "-p PDa" or "-p PDA" or "-p pda" will still execute the RDA without any errors. However, the generated output may not be the one desired.
- You will need to supply the following passwords once the RDA script starts running.ias_adminportalorassoConnect information in the following format: servername:listener_port:SID,
e.g. abc.co.us:1521:orcl

What is the difference between Oracle Weblogic thread states & attributes (Total, Standby, Active, idle, Hogging, Stuck)?

Weblogic thread state can be a particularly confusing topic for Weblogic administrators and individuals starting to learn and monitor Weblogic threads. The thread monitoring section can be accessed for each managed server Managed server under the Monitoring > Threads tab.
As you can see, the thread monitoring tab provides a complete view of each Weblogic thread along with its state. Now let’s review each section and state so you can properly understand how to assess the health.

# Summary :
Execute Thread Total Count: This is the total number of threads “created” from the Weblogic self-tuning pool and visible from the JVM Thread Dump. This value correspond to the sum of: Active + Standby threads
Active Execute Threads: This is the number of threads “eligible” to process a request. When thread demand goes up, Weblogic will start promoting threads from Standby to Active state which will enable them to process future client requests
Standby Thread Count: This is the number of threads waiting to be marked “eligible” to process client requests. These threads are created and visible from the JVM Thread Dump but not available yet to process a client request
Hogging Thread Count: This is the number of threads taking much more time than the current execution time in average calculated by the Weblogic kernel
Execute Thread Idle Count: This is the number of Active threads currently “available” to process a client request

In the above snapshots, we have:

Total of 43 threads, 29 in Standby state and 14 in Active state
Out of the 14 Active threads, we have 1 Hogging thread and 7 Idle threads e.g. 7 threads “available” for request processing
Another way to see the situation: we have a total of 7 threads currently “processing” client request with 1 out of 7 in Hogging state (e.g. taking more time than current calculated average)
# Thread matrix

This matrix gives you a view of each thread along with its current state. There is one more state that you must also understand:

STUCK: A stuck thread is identified by Weblogic when it is taking more time than the configured stuck thread time (default is 600 seconds). When facing slowdown conditions, you will normally see Weblogic threads transitioning from the Hogging state followed by STUCK, depending how long these threads remain stuck executing their current request

What are hogging threads? When do threads become hogged? After what period of time?

According to the Oracle doc hogging threads “.. will either be declared as stuck after the configured timeout or will return to the pool before that. The self-tuning mechanism will backfill if necessary.”

So how long does it take for them to become hogged? Nobody (including Google) seemed to know. Trust me I did some research and asked plenty of colleagues about this. Here is the answer:

If you run the application with 3 threads / 100 seconds / Thread.sleep() and immediately switch to the WebLogic 12c admin console Admin Server / Monitoring / Threads you will observe the following:

So interestingly hogging threads are detected right away! In my case it took about 2 seconds (I had to hit reload once).

So WebLogic transitions into FAILED state when a certain number of stuck threads are detected, right?

That’s a common misconception! The default configuration of WLS 12c (I also checked for WLS 11 = 10.3.3) is Stuck Thread Count = 0, which means the server “never transitions into FAILED server irrespective of the number of stuck threads”. You will only see the FAILED state only when you set the value to a positive number of threads!

Once the server transitions into FAILED, you can define if WLS should be shut down (and restarted by WLS nodemanager) or suspended.

Remember: WLS will not transition into FAILED state when StuckThreadCount is set to zero. Only the health runtime value is set to Warning (but this will be cleared if the hogging thread conditions clears) as shown below:

What exactly causes a stuck thread? What state does a thread have to be in to be marked as stuck?

In general there is a number of different thread states in Java: NEW, RUNNABLE, BLOCKED, WAITING, TIMED_WAITING, TERMINATED.

But which state has a thread to be in to be marked as stuck later? If you run the StuckThreadForFree application and create a stack trace with WebLogic admin console under Server / ServerName / Monitoring / Threads you can observe that the thread state is ACTIVE/TIMED_WAITING when using the Thread.sleep() method to block it:

"[ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'" TIMED_WAITING java.lang.Thread.sleep(Native Method) com.munzandmore.stuckthread.LongRunningEJB.threadSleep(LongRunningEJB.java:26) com.munzandmore.stuckthread.LongRunningEJB_x9v26k_NoIntfViewImpl.__WL_invoke(Unknown Source)

when using the calc() method to keep the threads busy they are state ACTIVE/RUNNABLE :"[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" RUNNABLE com.munzandmore.stuckthread.LongRunningEJB.threadCalc(LongRunningEJB.java:40) com.munzandmore.stuckthread.LongRunningEJB_x9v26k_NoIntfViewImpl.__WL_invoke(Unknown Source) weblogic.ejb.container.internal.SessionLocalMethodInvoker.invoke(SessionLocalMethodInvoker.java:31)

So both states can become stuck. Also, I am pretty sure I could also show the BLOCKED state when using a monitor lock for synchronization but due to time restrictions this is not included in the app.

Can a stuck thread still do reasonable work?

Absolutely! Just because a thread is marked as stuck it doesn’t mean it is frozen or unusable. Imagine you wanted to calculate PI, you are creating PDFs, distance maps, mapping the human genome or you have deployed some JCA adapter talking to MQ-Series, SAP or PeopleSoft which is internally using a Thread.sleep() method call. All of this is are reasonable usages likely to occur in the wild.

Do stuck threads ever dissapear? Can they be cleared somehow? Are they stuck forever?

First of all you cannot get rid of a stuck thread by simply “killing it”. You cannot cancel or kill any thread in Java. However, stuck threads automatically will disappear if the condition clears up which caused them to be marked as stuck (e.g. the sleep period is over or the calculation is done).

To prove the point, switch to the WebLogic admin console and under Server / ServerName / Configuration setStuckThreadCount to 3 and StuckThreadTime to 60 seconds then restart the server and run the StuckThreadForFree app to create 3 threads running for 120 seconds using the Thread.sleep() method (the other method will work as well, there is no difference, but keeping 3 threads busy by doing math proves to be a fan test of your machine as well):

In the WebLogic log file you will find three entries logging the stuck thread state after a while:<05 .04.2012="" 10:55="" mesz="" uhr="">

<05 .04.2012="" 10:55="" mesz="" uhr="">

<[STUCK] ExecuteThread: '4' for queue: 'webl ogic.kernel.Default (self-tuning)' has been busy for "85" seconds working on the request "Workmanager: default , Version: 1, Scheduled=false, Started=true, Started time: 85443 ms ", which is more than the configured time (StuckThreadMaxTime) of "60" seconds. Stack trace: java.lang.Thread.sleep(Native Method)

After waiting about one minute you will observe that WebLogic is transitioning into FAILED state as configured:

Wait another minute, then check the thread states under Server / ServerName / Monitoring / Threads which reveals the following:

So once the condition causing the stuck threads is cleared also the stuck threads will disappear again! Stuck threads are not stuck forever. Phew!

When should I use StuckThreadCount in the admin console or a Workmanager stuck-thread setting then?

Very good question. Use StuckThreadCount from the WebLogic admin console or with a definition moving the application into ADMIN mode if you can react on the FAILED state.

Do not use StuckThreadCount if the threads might be doing something useful and you cannot react on the situation anyway. Obviously transitioning into FAILED state and restarting WLS with the nodemanager is counterproductive if you threads are doing something useful.

Wednesday, October 9, 2013

Performance testing on IRCTC Website

Recently I have seen lot many people complaining about the performance of theIndian Railways website and most complains relates to the fact they are not able to book the tickets online and it takes hours for them to book the ticket via IRCTC.I also had faced the similar situation many times but I must say I have been fortunate enough that more than 95% percent of the time,I was able to book it in the first attempt.Maybe its just my luck or my good timings.So I thought let me do some investigation in my spare time and see how much traffic does this site handle and what all are the real trouble the users are facing while using this site.So with this intention,I pulled out the below facts about IRCTC site,(Please do remember that Government employee’s rarely exaggerates the numbers as they just don’t have any motives to play with numbers,based on this wisdom, I believe these below numbers are correct or should be on lower side),
IRCTC site receives close to 10 to 12 lakhs(10,00,000 to 12,00,000) hits per minute.
The site uses around 450mbps bandwidth.
Nearly 5 lakhs tickets are booked on daily basis.
At any point of time, it has got close to 10,00,000 concurrent connections open to it.

Of course IRCTC has taken some measures to increase the scalability of its site by adding flash memory drives,restricting the activities of its agent and ensuring that it does regular maintenance of its infrastructure,but I do feel personally it still needs lot of improvement given the kind of traffic it gets.

I can understand general users who don’t understand the complexity of the software complaining about it,but I just don’t understand as what makes IT Folks who had spend years designing IT Systems/Testing to complain about this site,can’t they understand the complexity/volume of the traffic the site handles and the way bureaucracy is slowing it down or they believe these numbers are negligible in volumes.

I feel the folks who are working on this site and folks who had scaled this site has done remarkable job to the people of India.They are in fact heroes and they have wealth of technical information with them. So I would request people who run technical IT magazines contact these folks and maybe you can ask them as what is secret of their scalability in spite of working in tight/constrained and limited work environment.How on earth can they maintain close to 10 million concurrent connections ? Isn’t this a interesting case study worth a attention ?.

Tuning Apache Tomcat – Reduce Network Calls

One of the ways to gain the response time by couple of milliseconds is to reduce the unnecessary calls back and forth in the network. Whenever the application is deployed in Apache Tomcat, I would suggest that you disable theenableLookups flag of connector element in server.xml of the Server. This is cheap way to gain couple of milliseconds in performance depending on the your network speed. If your server or users are slow network, then obviously you should be gaining in seconds, however for high speed network, you should be able to gain at least close to 200ms minimum.(My experiments showed 200ms gain minimum).

Whenever the application is deployed in Apache Tomcat, it logs information about the client’s ip or host name by querying the DNS servers. This query often adds couple of milliseconds to your transaction response time behind the scene. We can disable this roundtrip by setting the enableLookups flag to false in server.xml for connector element.

Below is sample server xml connector tag where changes needs to be done,
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="true" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true" />

By Default the enableLookups flag is set to true. Once the change is done, ensure that you restart the server.

However if you want to implement this change, ensure that you access impact on application correctly since if your application is mining the client’s data , then this change might break your application. By setting enableLookups to false, you will not be able to get the client’s host name details.

Tuesday, October 8, 2013

Creating Load test Project in Silk Performer

The easiest method of creating a load test script is to use the SilkPerformer Recorder, SilkPerformer’s engine for capturing and recording traffic and generating test scripts.

First the SilkPerformer Recorder captures and records the traffic between a client application and the server under test. When recording is complete, the SilkPerformer Recorder automatically generates a test script based on the recorded traffic. Scripts are written in SilkPerformer’s scripting language,

Benchmark Description Language (BDL).

During the recording phase, you define transactions. A transaction is a discrete piece of work that can be assigned to a virtual user in a load test and for which separate time measurements can be made. You should create new transactions only for pieces of work that don’t have dependencies on other pieces of work.

Individual time measurements can be made for any action or series of actions that occur during recording.

Defining a Load Test Project:

The first step in creating the sample load test project is to define the project giving the project a name and specifying the application type under test.

Procedure To define a load test project:

1 Click the Start here button on the SilkPerformer Workflow bar.

2 The Workflow - Outline Project dialog opens.
Enter a project name (e.g., Shopit) in the Project name field.

3 Enter a description for the project in the Project description field (e.g.,Web tutorial with sample application “shopit”).

4 Select Web business transaction (HTML/HTTP) in the Application type field.

Note :If you want to load test a Flash Remoting application, please refer to the Advanced Concepts book, chapter Load Testing Flash Remoting Applications for detailed information.
Note: If you want to load test a WebDAV application (Microsoft Outlook Web Access), simply select WebDAV (MS Outlook Web Access) in the Application type field. The procedure is identical to load testing any other Web business transaction (HTTP).

5 Click OK to create the project based on your settings.

Silk Performer-Over View

The Web Load Testing Tutorial is designed to ease you into the process of load testing with SilkPerformer, and to get you up and running as quickly as possible.It will help you take full advantage of SilkPerformer’s ease of use and to exploit the leading-edge functionality that’s embodied in e-business’s load-testing tool of choice.

This describes load-testing of Web applications on the protocol level (HTTP/HTML).If you want to load-test applications that heavily rely on AJAX technologies we recommend using browser-driven Web load testing.Browser-driven Web load testing is a solution that uses real Web browsers (Internet Explorer) to generate load, thus leveraging the AJAX logic built into Web browsers to precisely simulate complex AJAX behavior during testing.

SilkPerformer is the industry’s most powerful—yet easiest to use—enterpriseclass load and stress testing tool. Visual script generation techniques and the ability to test multiple application environments with thousands of concurrent users allow you to thoroughly test your enterprise applications’ reliability,performance, and scalability before they’re deployed—regardless of their size and complexity. Powerful root cause analysis tools and management reports help you isolate problems and make quick decisions—thereby minimizing test cycles and accelerating your time to market.
Benefits: Ensure the scalability, performance, and reliability of your enterprise applications

SilkPerformer ensures the quality of your enterprise applications by measuring their performance from the end-user perspective, as well as internally, in a variety of workload scenarios and dynamic load conditions.

Test remote components early in the development cycle Dramatically reduce the cost of bugs in your multi-tier enterprise application by testing the functionality, interoperability, and performance of remote components early in the development cycle—even before client applications have been built. You can rapidly generate test drivers for Web services, .NET remoting objects, EJB’s and Java RMI objects by exploring them via a point & click interface. Alternately, you can reuse unit test drivers written by developers for concurrency tests or you can build new test cases directly in Java and other .NET languages, such as C# and VB.NET, using SilkPerformer’s Visual Studio .NET Add-In.

Pinpoint problems easily for quick resolution

SilkPerformer’s unrivaled TrueLogTM technology for HTML, XML, SQL, TCP/ IP, and UDP based protocol data provides full visual root-cause analysis from the end-user perspective. TrueLogs visually recreate the data that users provide and receive during load tests—for HTML pages this includes all embedded

objects—enabling you to visually analyze the behavior of your application as errors occur during load tests. In addition detailed response timer statistics help you uncover the root causes of missed Service Level Agreements before your application goes live.Reusing projects SilkPerformer’s extended workflow simplifies and deepens its integration with SilkCentral Test Manager.

By clicking SilkPerformer’s new Reuse Project button, test projects can be uploaded to and reused by Test Manager (for test automation).

Calculate Number of Vusers in Load Runner

The business requirement: The website will be subject to 100.000 users per hour.
Analyzing real user behavior it is found that a typical user spends 15 minutes on the site, and browses 10 pages on average.

This means the implemented user scenario will contains 10 page-requests and each virtual user will run for 15 minutes.
To calculate the necessary number of users, we use the following basic formula:

Number of required VUsers = Required requests per seconds * User scenario length (sec)

Note: The requests-per-second-based approach (or in our case Transactions-per-Second, as we assume 1 request=1 transaction) is an adequate approach, as Load runner also uses Transactions per second when displaying metrics data.

Using the above requirements the number of virtual users can be calculated using the following formula:

No. of required VUsers = (number of site users per hour * requests per user /

3600) * user scenario length (sec)

Using the above formula:

Number of site users per hour := 100000 users (100k)

Requests per user := 10 reqs

User scenario length (sec) := 15 minutes * 60 seconds = 900 sec required transactions per hour (TPH) := number of site users per hour * requests per user required

Transactions per seconds = required transactions per hour (TPH)/(60 minutes * 60 seconds)

number of required VUsers := number of site users per hour * requests per user /
3600 * user scenario length (sec)

number of required VUsers = ((100000 * 10)/3600 )* 900 = 250000 Vusers
250,000 virtual users are required to create the appropriate load.
Well. This calculation is fine, but not very efficient
Notice, that in our implementation each virtual user will run for 15 minutes, but they are idle for most of the time (as each user performs 10 requests altogether).

Here is a better solution: We know that each request is likely to be quick (let’s say quicker than 1 second).

So we compact the user scenario: the script can perform the scenario in 15 seconds instead of 15 minutes.

The think times can also be reduced from 1.5 minutes to about 1 seconds accordingly.
After recalculating the number of required virtual users, we find that we need only 4166 virtual users to simulate the required load. This is a significant difference!

In practice scripts are not created with lengthy think times. A 1 second think time is usually the pragmatical approach. The recorded think times of the script can be scaled down or can be limited to a configured length. The Think Time run time settings’ “User random percentage of recorded think time” and “Limit think time to” options control this.

After the script is created the average length of script execution should be determined. This can be done with manual execution in VUGen. . The measured length can be substituted into the formula to determine the required number of virtual users.

Silk Performer Case Study- Silk Performer Interview Question

1) Clear the cookies on the browser

2) Create a shopIt Application script

a. With timers

i. Join The Experience
ii. Product Search
iii. Check Out
b. Data drive the test case with joined users
c. With Verification points

i. Welcome page title

ii. Add Verification for user name with existing parameter
iii. Add Verifications for Alphabets on the page

3) Create 2 profiles for
a. Bandwidth 128 Kbps with IE 6.0
b. Bandwidth High and Netscape 7.0

4) Add user types to the script with 2 profiles

5) Baseline the test and run

6) Add the threshold for custom timers with Min 4 times and Max 5 times

7) Add atleast 2 agents

8) Add the Performance Monitor workspace with local machine CPU, memory and Network Interface counters

9) Create a workload for 8 users in Increasing workload model
a. Start users 2
b. Increase by 1 user every 20 Sec
c. Simulation time 5 min
d. Warmup time 30 Sec
e. Measurement time 4:30 Sec

10) Run the load test

11) Write Analysi

Vuser Calculation In LoadRunner

How to calculate number of virtual users (VUsers) for load/stress testing? What should be the concurrent virtual users to load?

What should be the peak load?

25, 100, 500, 1000. We cannot give number of VUsers blindly which will not return intuitive result for analysis.

The main purpose of VUsers is to simulate the live environment. It is very tricky but easy to obtain number of VUsers required for the load/stress testing. Universal formula to calculate the arriving rate to the system is Little’s Law.

N = Z * (R + T)

where

N – number of VUsers,
Z – Transactions per Second (TPS)
R – Response Time in seconds
T – Think Time in seconds

If you get the following data from the stakeholders i.e. TPS, Response Time and Think Time, number of VUsers can be calculated easily.

E.g.

TPS is 100, R is 3 sec and T is 2 sec then N will be

N = 100 * (3+2)
= 100 * 5
= 500

Peak load will be 500 VUsers.

Friday, October 4, 2013

Printing the current log options to the output log (even if logging is disabled) in load runner

// Prints the current log options to the output log (even if logging is disabled)
// Example: xyz_print_log_options(lr_get_debug_message());
void xyz_print_log_options(unsigned int log_options_to_print) {
 char buf[(4 * 8) + 1]; // sizeof(unsigned int) is always 4 bytes in LoadRunner.
 unsigned int original_log_options = lr_get_debug_message();
 
 xyz_clear_log_options();
 xyz_set_log_options(LR_MSG_CLASS_BRIEF_LOG);
 
 // Print the bit pattern for the current log options.
 itoa(log_options_to_print, buf, 2);
 lr_output_message("Log options bit pattern: %032.32s", buf);
 
 lr_output_message("Log options selected:");
 
 if (log_options_to_print == 0) {
  lr_output_message("* Disabled (LR_MSG_CLASS_DISABLE_LOG)");
 } else {
 
  if (log_options_to_print & LR_MSG_CLASS_JIT_LOG_ON_ERROR) {
   lr_output_message("* Send messages only when an error occurs (LR_MSG_CLASS_JIT_LOG_ON_ERROR)");
  } else {
   lr_output_message("* Always send messages");
  }
 
  if (log_options_to_print & LR_MSG_CLASS_BRIEF_LOG) {
   lr_output_message("* Log messages at the detail level of \"Standard log\" (LR_MSG_CLASS_BRIEF_LOG)");
  }
 
  if (log_options_to_print & LR_MSG_CLASS_EXTENDED_LOG) {
   lr_output_message("* Log messages at the detail level of \"Extended log\" (LR_MSG_CLASS_EXTENDED_LOG)");
  }
 
  if (log_options_to_print & LR_MSG_CLASS_PARAMETERS) {
   lr_output_message("* Parameter substitution (LR_MSG_CLASS_PARAMETERS)");
  }
 
  if (log_options_to_print & LR_MSG_CLASS_RESULT_DATA) {
   lr_output_message("* Data returned by server (LR_MSG_CLASS_RESULT_DATA)");
  }
 
  if (log_options_to_print & LR_MSG_CLASS_FULL_TRACE) {
   lr_output_message("* Advanced trace (LR_MSG_CLASS_FULL_TRACE)");
  }
 }
 
 xyz_clear_log_options();
 xyz_set_log_options(original_log_options);
 
 return;
}
/*
Output looks like this:
globals.h(26): Log options bit pattern: 00000000000000000000001000011110
globals.h(28): Log options selected:
globals.h(35): * Send messages only when an error occurs (LR_MSG_CLASS_JIT_LOG_ON_ERROR)
globals.h(45): * Log messages at the detail level of "Extended log" (LR_MSG_CLASS_EXTENDED_LOG)
globals.h(49): * Parameter substitution (LR_MSG_CLASS_PARAMETERS)
globals.h(53): * Data returned by server (LR_MSG_CLASS_RESULT_DATA)
globals.h(57): * Advanced trace (LR_MSG_CLASS_FULL_TRACE)
*/

Turning off scripting errors - "MESSAGE FROM WEBPAGE" IN LOAD RUNNER

You may find you get pop-up messages in some WebbIE applications and more generally in Internet Explorer that tell you about scripting errors on the web page. Generally, unless you are a web developer, you just don't care about these messages and don't want them to appear. Turning them off won't hurt the operation of any program, so here is how to turn off scripting error messages in Internet Explorer. These messages say things like "Errors on this webpage might cause it to work incorrectly."

The first thing to try is turning off these messages in Internet Explorer:
Open Internet Explorer
Open the Tools menu (Alt and T)
Select the Internet Options item (O key)
The Internet Options dialog has many tabs. You need the Advanced tab. Press Control and Tab until you get to the Advanced Tab (that's six presses for Internet Explorer 8)
You should now be in a list, starting with Accessibility as the first item in Internet Explorer 8. This has the scripting options you want to change.
Cursor down to "Disable script debugging (Internet Explorer)" and press Space until it is on.
Cursor down to "Disable script debugging (Other)" and press Space until it is on.
Cursor down to "Display a notification about every script error" and press Space until it is off.
Press the Return key to close the Internet Options dialog. You should now have turned off the scripting errors.

Not worked? Here are some other things you can try:
Update Internet Explorer. You should be on the latest Internet Explorer, it's safer and better. You can get it from Windows Update. Start Internet Explorer, Alt and T for the Tools menu, then cursor down to Windows Update.
Change your antivirus program. These cause no end of trouble.
Set your Internet Explorer Security settings to Default. You do this again in the Internet Explorer Tools menu, Internet Options, Security tab, and click Default Level.
Delete your Internet Explorer temporary files and cookies and history. Internet Options, General tab. This will mean you'll have to re-enter your username and password in places where you've saved it, so make sure you know them all before you try this.