Thursday, February 20, 2014

How to run the JMeter for more than 300 users?

Run in Non GUI distributed mode to run for more than 300 users. It depends on RAM size. For the system with 3.2 GB, you can run up-to 1000 users.

Step1:
Configure Jmeter for Distributed testing

Step2:
Place your script and csv in /bin folder of the jmeter.

Step3:
Open command prompt , go to jmeter bin directory and give the following command

jmeter -n -t testplan.jmx -R ip1,ip2 -l resultfilename

How to handle sessions in JMeter?

To handle the sessions in JMeter you need to do the following steps:

Add->Pre-Processors->HTTP URL Re-writting Modifier for the sampler where your session variable is present

Then add your variable in session argument name and check chache session id.



Software Testing terms A-Z

Acceptance Test. Formal tests (often performed by a customer) to determine whether or not a system has satisfied predetermined acceptance criteria. These tests are often used to enable the customer (either internal or external) to determine whether or not to accept a system.

Accessibility testing. Testing that determines if software will be usable by people with disabilities.

Ad Hoc Testing. Testing carried out using no recognised test case design technique.

Algorithm verification testing. A software development and test phase focused on the validation and tuning of key algorithms using an iterative experimentation process.

Alpha Testing. Testing of a software product or system conducted at the developer's site by the customer.

Aperiodic bug. A transient bug that becomes active periodically (sometimes referred to as an intermittent bug). Because of their short duration, transient faults are often detected through the anomalies that result from their propagation.


Artistic testing. Also known as Exploratory testing.

Assertion Testing. (NBS) A dynamic analysis technique which inserts assertions about the relationship between program variables into the program code. The truth of the assertions is determined as the program executes.


Automated Testing. Software testing which is assisted with software technology that does not require operator (tester) input, analysis, or evaluation.


Audit.
  (1) An independent examination of a work product or set of work products to assess compliance with specifications, standards, contractual agreements, or other criteria. (IEEE)

  (2) To conduct an independent review and examination of system records and activities in order to test the adequacy and effectiveness of data security and data integrity procedures, to ensure compliance with established policy and operational procedures, and to recommend any necessary changes. (ANSI)

ABEND Abnormal END. A mainframe term for a program crash. It is always associated with a failure code, known as an ABEND code.
Background testing. Is the execution of normal functional testing while the SUT is exercised by a realistic work load. This work load is being processed "in the background" as far as the functional testing is concerned.

Bandwidth testing. Testing a site with a variety of link speeds, both fast (internally connected LAN) and slow (externally, through a proxy or firewall, and over a modem); sometimes called slow link testing if the organization typically tests with a faster link internally (in that case, they are doing a specific pass for the slower line speed only).

Basis path testing. Identifying tests based on flow and paths of the program or system.

Basis test set. A set of test cases derived from the code logic which ensure that 100\% branch coverage is achieved.


Bug: glitch, error, goof, slip, fault, blunder, boner, howler, oversight, botch, delusion, elision.

Beta Testing. Testing conducted at one or more customer sites by the end-user of a delivered software product or system.

Benchmarks Programs that provide performance comparison for software, hardware, and systems.

Benchmarking is specific type of performance test with the purpose of determining performance baselines for comparison.

Big-bang testing. Integration testing where no incremental testing takes place prior to all the system's components being combined to form the system.

Black box testing. A testing method where the application under test is viewed as a black box and the internal behavior of the program is completely ignored. Testing occurs based upon the external specifications. Also known as behavioral testing, since only the external behaviors of the program are evaluated and analyzed.

Blink testing. What you do in blink testing is plunge yourself into an ocean of data-- far too much data to comprehend. And then you comprehend it. Don't know how to do that? Yes you do. But you may not realize that you know how.

Bottom-up Testing. An approach to integration testing where the lowest level components are tested first, then used to facilitate the testing of higher level components. The process is repeated until the component at the top of the hierarchy is tested.

Boundary Value Analysis (BVA). BVA is different from equivalence partitioning in that it focuses on "corner cases" or values that are usually out of range as defined by the specification. This means that if function expects all values in range of negative 100 to positive 1000, test inputs would include negative 101 and positive 1001. BVA attempts to derive the value often used as a technique for stress, load or volume testing. This type of validation is usually performed after positive functional validation has completed (successfully) using requirements specifications and user documentation.

Branch Coverage Testing. - Verify each branch has true and false outcomes at least once.

Breadth test. - A test suite that exercises the full scope of a system from a top-down perspective, but does not test any aspect in detail

BRS - Business Requirement Specification
Capability Maturity Model (CMM). - A description of the stages through which software organizations evolve as they define, implement, measure, control and improve their software processes. The model is a guide for selecting the process improvement strategies by facilitating the determination of current process capabilities and identification of the issues most critical to software quality and process improvement.

Capture-replay tools. - Tools that gives testers the ability to move some GUI testing away from manual execution by �capturing� mouse clicks and keyboard strokes into scripts, and then �replaying� that script to re-create the same sequence of inputs and responses on subsequent test.

Cause Effect Graphing. (1) [NBS] Test data selection technique. The input and output domains are partitioned into classes and analysis is performed to determine which input classes cause which effect. A minimal set of inputs is chosen which will cover the entire effect set. (2)A systematic method of generating test cases representing combinations of conditions. See: testing, functional.

Clean test. A test whose primary purpose is validation; that is, tests designed to demonstrate the software`s correct working.

Clear-box testing. See White-box testing.

Code audit. An independent review of source code by a person, team, or tool to verify compliance with software design documentation and programming standards. Correctness and efficiency may also be evaluated. (IEEE)

Code Inspection. A manual [formal] testing [error detection] technique where the programmer reads source code, statement by statement, to a group who ask questions analyzing the program logic, analyzing the code with respect to a checklist of historically common programming errors, and analyzing its compliance with coding standards. Contrast with code audit, code review, code walkthrough. This technique can also be applied to other software and configuration items. [G.Myers/NBS] Syn: Fagan Inspection

Code Walkthrough. A manual testing [error detection] technique where program [source code] logic [structure] is traced manually [mentally] by a group with a small set of test cases, while the state of program variables is manually monitored, to analyze the programmer's logic and assumptions.[G.Myers/NBS]

Coexistence Testing. Coexistence isn't enough. It also depends on load order, how virtual space is mapped at the moment, hardware and software configurations, and the history of what took place hours or days before. It�s probably an exponentially hard problem rather than a square-law problem. [from Quality Is Not The Goal. By Boris Beizer, Ph. D.]

Comparison testing. Comparing software strengths and weaknesses to competing products

Compatibility bug A revision to the framework breaks a previously working feature: a new feature is inconsistent with an old feature, or a new feature breaks an unchanged application rebuilt with the new framework code.

Compatibility Testing. The process of determining the ability of two or more systems to exchange information. In a situation where the developed software replaces an already working program, an investigation should be conducted to assess possible comparability problems between the new software and other programs or systems.

Composability testing -testing the ability of the interface to let users do more complex tasks by combining different sequences of simpler, easy-to-learn tasks. [Timothy Dyck, 'Easy' and other lies, eWEEK April 28, 2003]

Condition Coverage. A test coverage criteria requiring enough test cases such that each condition in a decision takes on all possible outcomes at least once, and each point of entry to a program or subroutine is invoked at least once. Contrast with branch coverage, decision coverage, multiple condition coverage, path coverage, statement coverage.[G.Myers]

Configuration. The functional and/or physical characteristics of hardware/software as set forth in technical documentation and achieved in a product. (MIL-STD-973)

Configuration control. An element of configuration management, consisting of the evaluation, coordination, approval or disapproval, and implementation of changes to configuration items after formal establishment of their configuration identification. (IEEE)

Conformance directed testing. Testing that seeks to establish conformance to requirements or specification. [R. V. Binder, 1999]

Cookbook scenario. A test scenario description that provides complete, step-by-step details about how the scenario should be performed. It leaves nothing to change. [Scott Loveland, 2005]

Coverage analysis. Determining and assessing measures associated with the invocation of program structural elements to determine the adequacy of a test run. Coverage analysis is useful when attempting to execute each statement, branch, path, or iterative structure in a program. Tools that capture this data and provide reports summarizing relevant information have this feature. (NIST)

CRUD Testing. Build CRUD matrix and test all object creation, reads, updates, and deletion. [William E. Lewis, 2000]

Data-Driven testing. An automation approach in which the navigation and functionality of the test script is directed through external data; this approach separates test and control data from the test script. [Daniel J. Mosley, 2002]

Data flow testing. Testing in which test cases are designed based on variable usage within the code.[BCS]

Database testing. Check the integrity of database field values. [William E. Lewis, 2000]

Defect. The difference between the functional specification (including user documentation) and actual program text (source code and data). Often reported as problem and stored in defect-tracking and problem-management system

Defect. Also called a fault or a bug, a defect is an incorrect part of code that is caused by an error. An error of commission causes a defect of wrong or extra code. An error of omission results in a defect of missing code. A defect may cause one or more failures.[Robert M. Poston, 1996.]

Defect. A flaw in the software with potential to cause a failure.. [Systematic Software Testing by Rick D. Craig and Stefan P. Jaskiel 2002]

Defect Age. A measurement that describes the period of time from the introduction of a defect until its discovery. . [Systematic Software Testing by Rick D. Craig and Stefan P. Jaskiel 2002]

Defect Density. A metric that compares the number of defects to a measure of size (e.g., defects per KLOC). Often used as a measure of defect quality. [Systematic Software Testing by Rick D. Craig and Stefan P. Jaskiel 2002]

Defect Discovery Rate. A metric describing the number of defects discovered over a specified period of time, usually displayed in graphical form. [Systematic Software Testing by Rick D. Craig and Stefan P. Jaskiel 2002]

Defect Removal Efficiency (DRE). A measure of the number of defects discovered in an activity versus the number that could have been found. Often used as a measure of test effectiveness. [Systematic Software Testing by Rick D. Craig and Stefan P. Jaskiel 2002]

Defect Seeding. The process of intentionally adding known defects to those already in a computer program for the purpose of monitoring the rate of detection and removal, and estimating the number of defects still remaining. Also called Error Seeding. [Systematic Software Testing by Rick D. Craig and Stefan P. Jaskiel 2002]

Defect Masked. An existing defect that hasn't yet caused a failure because another defect has prevented that part of the code from being executed. [Systematic Software Testing by Rick D. Craig and Stefan P. Jaskiel 2002]

Depth test. A test case, that exercises some part of a system to a significant level of detail. [Dorothy Graham, 1999]

Decision Coverage. A test coverage criteria requiring enough test cases such that each decision has a true and false result at least once, and that each statement is executed at least once. Syn: branch coverage. Contrast with condition coverage, multiple condition coverage, path coverage, statement coverage.[G.Myers]

Design-based testing. Designing tests based on objectives derived from the architectural or detail design of the software (e.g., tests that execute specific invocation paths or probe the worst case behaviour of algorithms). [BCS

Dirty testing Negative testing. [Beizer]

Dynamic testing. Testing, based on specific test cases, by execution of the test object or running programs [Tim Koomen, 1999]
End-to-End testing. Similar to system testing; the 'macro' end of the test scale; involves testing of a complete application environment in a situation that mimics real-world use, such as interacting with a database, using network communications, or interacting with other hardware, applications, or systems if appropriate.

Equivalence Partitioning: An approach where classes of inputs are categorized for product or function validation. This usually does not include combinations of input, but rather a single state value based by class. For example, with a given function there may be several classes of input that may be used for positive testing. If function expects an integer and receives an integer as input, this would be considered as positive test assertion. On the other hand, if a character or any other input class other than integer is provided, this would be considered a negative test assertion or condition.

Error: An error is a mistake of commission or omission that a person makes. An error causes a defect. In software development one error may cause one or more defects in requirements, designs, programs, or tests.[Robert M. Poston, 1996.]

Errors: The amount by which a result is incorrect. Mistakes are usually a result of a human action. Human mistakes (errors) often result in faults contained in the source code, specification, documentation, or other product deliverable. Once a fault is encountered, the end result will be a program failure. The failure usually has some margin of error, either high, medium, or low.

Error Guessing: Another common approach to black-box validation. Black-box testing is when everything else other than the source code may be used for testing. This is the most common approach to testing. Error guessing is when random inputs or conditions are used for testing. Random in this case includes a value either produced by a computerized random number generator, or an ad hoc value or test conditions provided by engineer.

Error guessing. A test case design technique where the experience of the tester is used to postulate what faults exist, and to design tests specially to expose them [from BS7925-1]

Error seeding. The purposeful introduction of faults into a program to test effectiveness of a test suite or other quality assurance program. [R. V. Binder, 1999]

Exception Testing. Identify error messages and exception handling processes an conditions that trigger them. [William E. Lewis, 2000]

Exhaustive Testing.(NBS) Executing the program with all possible combinations of values for program variables. Feasible only for small, simple programs.

Exploratory Testing: An interactive process of concurrent product exploration, test design, and test execution. The heart of exploratory testing can be stated simply: The outcome of this test influences the design of the next test. [James Bach]
Failure: A failure is a deviation from expectations exhibited by software and observed as a set of symptoms by a tester or user. A failure is caused by one or more defects. The Causal Trail. A person makes an error that causes a defect that causes a failure.[Robert M. Poston, 1996]

Fix testing. Rerunning of a test that previously found the bug in order to see if a supplied fix works. [Scott Loveland, 2005]

Follow-up testing, we vary a test that yielded a less-thanspectacular failure. We vary the operation, data, or environment, asking whether the underlying fault in the code can yield a more serious failure or a failure under a broader range of circumstances.[Measuring the Effectiveness of Software Testers,Cem Kaner, STAR East 2003]

Formal Testing. (IEEE) Testing conducted in accordance with test plans and procedures that have been reviewed and approved by a customer, user, or designated level of management. Antonym: informal testing.

Framework scenario. A test scenario definition that provides only enough high-level information to remind the tester of everything that needs to be covered for that scenario. The description captures the activity’s essence, but trusts the tester to work through the specific steps required.[Scott Loveland, 2005]

Free Form Testing. Ad hoc or brainstorming using intuition to define test cases. [William E. Lewis, 2000]

Functional Decomposition Approach. An automation method in which the test cases are reduced to

fundamental tasks, navigation, functional tests, data verification, and return navigation; also known as Framework Driven Approach. [Daniel J. Mosley, 2002]

Functional testing Application of test data derived from the specified functional requirements without regard to the final program structure. Also known as black-box testing.

Function verification test (FVT). Testing of a complete, yet containable functional area or component within the overall software package. Normally occurs immediately after Unit test. Also known as Integration test. [Scott Loveland, 2005]
Gray box testing. Tests involving inputs and outputs, but test design is educated by information about the code or the program operation of a kind that would normally be out of scope of view of the tester.[Cem Kaner]
Gray box testing. Test designed based on the knowledge of algorithm, internal states, architectures, or other high -level descriptions of the program behavior. [Doug Hoffman]
 
Gray box testing. Examines the activity of back-end components during test case execution. Two types of problems that can be encountered during gray-box testing are: 
  •  A component encounters a failure of some kind, causing the operation to be aborted. The user interface will typically indicate that an error has occurred.
  • The test executes in full, but the content of the results is incorrect. Somewhere in the system, a component processed data incorrectly, causing the error in the results.
  • [Elfriede Dustin. "Quality Web Systems: Performance, Security & Usability."] 
Grooved Tests. Tests that simply repeat the same activity against a target product from cycle to cycle. [Scott Loveland, 2005]
 
 Heuristic Testing: An approach to test design that employs heuristics to enable rapid development of test cases.[James Bach]
 
 
High-level tests. These tests involve testing whole, complete products [Kit, 1995]
 
HTML validation testing. Specific to Web testing. This certifies that the HTML meets specifications and internal coding standards.
 
W3C Markup Validation Service, a free service that checks Web documents in formats like HTML and XHTML for conformance to W3C Recommendations and other standards.
 
Incremental integration testing. Incremental integration testing - continuous testing of an application as new functionality is added; requires that various aspects of an application's functionality be independent enough to work separately before all parts of the program are completed, or that test drivers be developed as needed; done by programmers or by testers.

Inspection. A formal evaluation technique in which software requirements, design, or code are examined in detail by person or group other than the author to detect faults, violations of development standards, and other problems [IEEE94]. A quality improvement process for written material that consists of two dominant components: product (document) improvement and process improvement (document production and inspection).

Integration. The process of combining software components or hardware components or both into overall system.

Integration testing - testing of combined parts of an application to determine if they function together correctly. The 'parts' can be code modules, individual applications, client and server applications on a network, etc. This type of testing is especially relevant to client/server and distributed systems.

Integration Testing. Testing conducted after unit and feature testing. The intent is to expose faults in the interactions between software modules and functions. Either top-down or bottom-up approaches can be used. A bottom-up method is preferred, since it leads to earlier unit testing (step-level integration) This method is contrary to the big-bang approach where all source modules are combined and tested in one step. The big-bang approach to integration should be discouraged.
Interface Tests. Programs that probide test facilities for external interfaces and function calls. Simulation is often used to test external interfaces that currently may not be available for testing or are difficult to control. For example, hardware resources such as hard disks and memory may be difficult to control. Therefore, simulation can provide the characteristics or behaviors for specific function.
Internationalization testing (I18N) - testing related to handling foreign text and data within the program. This would include sorting, importing and exporting test and data, correct handling of currency and date and time formats, string parsing, upper and lower case handling and so forth. [Clinton De Young, 2003].

Interoperability Testing which measures the ability of your software to communicate across the network on multiple machines from multiple vendors each of whom may have interpreted a design specification critical to your success differently.

Inter-operability Testing. True inter-operability testing concerns testing for unforeseen interactions with

other packages with which your software has no direct connection. In some quarters, inter-operability testing labor equals all other testing combined. This is the kind of testing that I say shouldn’t be done because it can�t be done.[from Quality Is Not The Goal. By Boris Beizer, Ph. D.]

Inspection. A formal evaluation technique in which software requirements, design, or code are examined in detail by person or group other than the author to detect faults, violations of development standards, and other problems [IEEE94].

Install/uninstall testing. Testing of full, partial, or upgrade install/uninstall processes.
Key Word-Driven Testing. The approach developed by Carl Nagle of the SAS Institute that is offered as freeware on the Web; Key Word-Driven Test. ing is an enhancement to the data-driven methodology. [Daniel J. Mosley, 2002]

Latent bug A bug that has been dormant (unobserved) in two or more releases. [R. V. Binder, 1999]

Lateral testing. A test design technique based on lateral thinking principals, to identify faults. [Dorothy Graham, 1999]

Limits testing. See Boundary Condition testing.

Load testing. Testing an application under heavy loads, such as testing of a web site under a range of loads to determine at what point the system's response time degrades or fails.

Load stress test. A test is design to determine how heavy a load the application can handle.

Load-stability test. Test design to determine whether a Web application will remain serviceable over extended time span.

Load isolation test. The workload for this type of test is designed to contain only the subset of test cases that caused the problem in previous testing.

Longevity testing. See Reliability testing.

Long-haul Testing. See Reliability testing.
Master Test Planning. An activity undertaken to orchestrate the testing effort across levels and organizations.[Systematic Software Testing by Rick D. Craig and Stefan P. Jaskiel 2002]

Memory leak testing. Testing the server components to see if memory is not properly referenced and released, which can lead to instability and the product's crashing.

Model-Based Testing. Model-based testing takes the application and models it so that each state of each input, output, form, and function is represented. Since this is based on detailing the various states of objects and data, this type of testing is very similar to charting out states. Many times a tool is used to automatically go through all the states in the model and try different inputs in each to ensure that they all interact correctly.[Lydia Ash, 2003]

Monkey Testing. (smart monkey testing) Input are generated from probability distributions that reflect actual expected usage statistics -- e.g., from user profiles. There are different levels of IQ in smart monkey testing. In the simplest, each input is considered independent of the other inputs. That is, a given test requires an input vector with five components. In low IQ testing, these would be generated independently. In high IQ monkey testing, the correlation (e.g., the covariance) between these input distribution is taken into account. In all branches of smart monkey testing, the input is considered as a single event.[Visual Test 6 Bible by Thomas R. Arnold, 1998 ]

Monkey Testing. (brilliant monkey testing) The inputs are created from a stochastic regular expression or stochastic finite-state machine model of user behavior. That is, not only are the values determined by probability distributions, but the sequence of values and the sequence of states in which the input provider goes is driven by specified probabilities.[Visual Test 6 Bible by Thomas R. Arnold, 1998 ]

Monkey Testing. (dumb-monkey testing)Inputs are generated from a uniform probability distribution without regard to the actual usage statistics.[Visual Test 6 Bible by Thomas R. Arnold, 1998 ]

Maximum Simultaneous Connection testing. This is a test performed to determine the number of connections which the firewall or Web server is capable of handling.

Migration Testing. Testing to see if the customer will be able to transition smoothly from a prior version of the software to a new one. [Scott Loveland, 2005]

Mutation testing. A testing strategy where small variations to a program are inserted (a mutant), followed by execution of an existing test suite. If the test suite detects the mutant, the mutant is 'retired.' If undetected, the test suite must be revised. [R. V. Binder, 1999]

Multiple Condition Coverage. A test coverage criteria which requires enough test cases such that all possible combinations of condition outcomes in each decision, and all points of entry, are invoked at least once.[G.Myers] Contrast with branch coverage, condition coverage, decision coverage, path coverage, statement coverage.
Negative test. A test whose primary purpose is falsification; that is tests designed to break the software[B.Beizer1995]

Noncritical code analysis. Examines software elements that are not designated safety-critical and ensures that these elements do not cause a hazard. (IEEE)
Orthogonal array testing: Technique can be used to reduce the number of combination and provide maximum coverage with a minimum number of TC.Pay attention to the fact that it is an old and proven technique. The OAT was introduced for the first time by Plackett and Burman in 1946 and was implemented by G. Taguchi, 1987

Orthogonal array testing: Mathematical technique to determine which variations of parameters need to be tested. [William E. Lewis, 2000]

Oracle. Test Oracle: a mechanism to produce the predicted outcomes to compare with the actual outcomes of the software under test [from BS7925-1]
Parallel Testing. Testing a new or an alternate data processing system with the same source data that is used in another system. The other system is considered as the standard of comparison. Syn: parallel run.[ISO]

Penetration testing. The process of attacking a host from outside to ascertain remote security vulnerabilities. Other responsibilities of professional penetration tester is to enforce the countermeasure's for certain types of known attacks and vulnerabilities.

Performance Testing. Testing conducted to evaluate the compliance of a system or component with specific performance requirements [BS7925-1]

Performance testing can be undertaken to: 1) show that the system meets specified performance objectives, 2) tune the system, 3) determine the factors in hardware or software that limit the system's performance, and 4) project the system's future load- handling capacity in order to schedule its replacements" [Software System Testing and Quality Assurance. Beizer, 1984, p. 256]

Postmortem. Self-analysis of interim or fully completed testing activities with the goal of creating improvements to be used in future.[Scott Loveland, 2005]

Preventive Testing Building test cases based upon the requirements specification prior to the creation of the code, with the express purpose of validating the requirements [Systematic Software Testing by Rick D. Craig and Stefan P. Jaskiel 2002]

Prior Defect History Testing. Test cases are created or rerun for every defect found in prior tests of the system. [William E. Lewis, 2000]
Qualification Testing. (IEEE) Formal testing, usually conducted by the developer for the consumer, to demonstrate that the software meets its specified requirements. See: acceptance testing.

Quality. The degree to which a program possesses a desired combination of attributes that enable it to perform its specified end use.

Quality Assurance (QA) Consists of planning, coordinating and other strategic activities associated with measuring product quality against external requirements and specifications (process-related activities).

Quality Control (QC) Consists of monitoring, controlling and other tactical activities associated with the measurement of product quality goals.

Our definition of Quality: Achieving the target (not conformance to requirements as used by many authors) & minimizing the variability of the system under test
Race condition defect. Many concurrent defects result from data-race conditions. A data-race condition may be defined as two accesses to a shared variable, at least one of which is a write, with no mechanism used by either to prevent simultaneous access. However, not all race conditions are defects.

Recovery testing Testing how well a system recovers from crashes, hardware failures, or other catastrophic problems.

Regression Testing. Testing conducted for the purpose of evaluating whether or not a change to the system (all CM items) has introduced a new failure. Regression testing is often accomplished through the construction, execution and analysis of product and system tests.

Regression Testing. - testing that is performed after making a functional improvement or repair to the program. Its purpose is to determine if the change has regressed other aspects of the program [Glenford J.Myers, 1979]
More about Regression Testing

Reengineering. The process of examining and altering an existing system to reconstitute it in a new form. May include reverse engineering (analyzing a system and producing a representation at a higher level of abstraction, such as design from code), restructuring (transforming a system from one representation to another at the same level of abstraction), recommendation (analyzing a system and producing user and support documentation), forward engineering (using software products derived from an existing system, together with new requirements, to produce a new system), and translation (transforming source code from one language to another or from one version of a language to another).

Reference testing. A way of deriving expected outcomes by manually validating a set of actual outcomes. A less rigorous alternative to predicting expected outcomes in advance of test execution. [Dorothy Graham, 1999]
Reliability testing. Verify the probability of failure free operation of a computer program in a specified environment for a specified time.

Reliability of an object is defined as the probability that it will not fail under specified conditions, over a period of time. The specified conditions are usually taken to be fixed, while the time is taken as an independent variable. Thus reliability is often written R(t) as a function of time t, the probability that the object will not fail within time t.

Any computer user would probably agree that most software is flawed, and the evidence for this is that it does fail. All software flaws are designed in -- the software does not break, rather it was always broken. But unless conditions are right to excite the flaw, it will go unnoticed -- the software will appear to work properly. [Professor Dick Hamlet. Ph.D.]

Range Testing. For each input identifies the range over which the system behavior should be the same. [William E. Lewis, 2000]

Risk-Based Testing: Any testing organized to explore specific product risks.[James Bach website]

Risk management. An organized process to identify what can go wrong, to quantify and access associated risks, and to implement/control the appropriate approach for preventing or handling each risk identified.

Robust test. A test, that compares a small amount of information, so that unexpected side effects are less likely to affect whether the test passed or fails. [Dorothy Graham, 1999]
Sanity Testing - typically an initial testing effort to determine if a new software version is performing well enough to accept it for a major testing effort. For example, if the new software is often crashing systems, bogging down systems to a crawl, or destroying databases, the software may not be in a 'sane' enough condition to warrant further testing in its current state.
Scalability testing is a subtype of performance test where performance requirements for response time, throughput, and/or utilization are tested as load on the SUT is increased over time. [Load Testing Terminology by Scott Stirling ]

Scenario-Based Testing. Scenario-based testing is one way to document the software specifications and requirements for a project. Scenario-based testing takes each user scenario and develops tests that verify that a given scenario works. Scenarios focus on the main goals and requirements. If the scenario is able to flow from the beginning to the end, then it passes.[Lydia Ash, 2003]

(SDLC) System Development Life Cycle - a phases used to develop, maintain, and replace information systems. Typical phases in the SDLC are: Initiation Phase, Planning Phase, Functional Design Phase, System Design Phase, Development Phase, Integration and Testing Phase, Installation and Acceptance Phase, and Maintenance Phase.
The V-model talks about SDLC (System Development Life Cycle) phases and maps them to various test levels

Security Audit. An examination (often by third parties) of a server's security controls and may be disaster recovery mechanisms.

Sensitive test. A test, that compares a large amount of information, so that it is more likely to defect unexpected differences between the actual and expected outcomes of the test. [Dorothy Graham, 1999]

Server log testing. Examining the server logs after particular actions or at regular intervals to determine if there are problems or errors generated or if the server is entering a faulty state.

Service test. Test software fixes, both individually and bundled together, for software that is already in use by customers. [Scott Loveland, 2005]

Skim Testing A testing technique used to determine the fitness of a new build or release of an AUT to undergo further, more thorough testing. In essence, a "pretest" activity that could form one of the acceptance criteria for receiving the AUT for testing [Testing IT: An Off-the-Shelf Software Testing Process by John Watkins]

Smoke test describes an initial set of tests that determine if a new version of application performs well enough for further testing.[Louise Tamres, 2002]

Sniff test. A quick check to see if any major abnormalities are evident in the software.[Scott Loveland, 2005 ]
Specification-based test. A test, whose inputs are derived from a specification.

Spike testing. to test performance or recovery behavior when the system under test (SUT) is stressed with a sudden and sharp increase in load should be considered a type of load test.[ Load Testing Terminology by Scott Stirling ]

Standards This page lists many standards that can be related to software testing

STEP (Systematic Test and Evaluation Process) Software Quality Engineering's copyrighted testing methodology.

Stability testing. Testing the ability of the software to continue to function, over time and over its full range of use, without failing or causing failure. (see also Reliability testing)

State-based testing Testing with test cases developed by modeling the system under test as a state machine [R. V. Binder, 1999]

State Transition Testing. Technique in which the states of a system are fist identified and then test cases are written to test the triggers to cause a transition from one condition to another state. [William E. Lewis, 2000]

Static testing. Source code analysis. Analysis of source code to expose potential defects.

Statistical testing. A test case design technique in which a model is used of the statistical distribution of the input to construct representative test cases. [BCS]

Stealth bug. A bug that removes information useful for its diagnosis and correction. [R. V. Binder, 1999]

Storage test. Study how memory and space is used by the program, either in resident memory or on disk. If there are limits of these amounts, storage tests attempt to prove that the program will exceed them. [Cem Kaner, 1999, p55]

Streamable Test cases. Test cases which are able to run together as part of a large group. [Scott Loveland, 2005]

Stress / Load / Volume test. Tests that provide a high degree of activity, either using boundary conditions as inputs or multiple copies of a program executing in parallel as examples.

Stress Test. A stress test is designed to determine how heavy a load the Web application can handle. A huge load is generated as quickly as possible in order to stress the application to its limit. The time between transactions is minimized in order to intensify the load on the application, and the time the users would need for interacting with their Web browsers is ignored. A stress test helps determine, for example, the maximum number of requests a Web application can handle in a specific period of time, and at what point the application will overload and break down.[Load Testing by S. Asbock]

Structural Testing. (1)(IEEE) Testing that takes into account the internal mechanism [structure] of a system or component. Types include branch testing, path testing, statement testing. (2) Testing to insure each program statement is made to execute during testing and that each program statement performs its intended function. Contrast with functional testing. Syn: white-box testing, glass-box testing, logic driven testing.

System testing Black-box type testing that is based on overall requirements specifications; covers all combined parts of a system.

System verification test. (SVT). Testing of an entire software package for the first time, with all components working together to deliver the project's intended purpose on supported hardware platforms. [Scott Loveland, 2005]
Table testing. Test access, security, and data integrity of table entries. [William E. Lewis, 2000]
Test Artifact Set. Captures and presents information related to the tests performed.

Test Bed. An environment containing the hardware, instrumentation, simulators, software tools, and other support elements needed to conduct a test [IEEE 610].

Test Case. A set of test inputs, executions, and expected results developed for a particular objective.

Test conditions. The set of circumstances that a test invokes. [Daniel J. Mosley, 2002]

Test Coverage The degree to which a given test or set of tests addresses all specified test cases for a given system or component.
Some Types Of Software Testing Coverage read a description

Test Criteria. Decision rules used to determine whether software item or software feature passes or fails a test.

Test data. The actual (sets of) values used in the test or that are necessary to execute the test. Test data instantiates the condition being tested (as input or as pre-existing data) and is used to verify that a specific requirement has been successfully implemented (comparing actual results to the expected results). [Daniel J. Mosley, 2002]

Test Documentation. (IEEE) Documentation describing plans for, or results of, the testing of a system or component, Types include test case specification, test incident report, test log, test plan, test procedure, test report.
Test Driver A software module or application used to invoke a test item and, often, provide test inputs (data), control and monitor execution. A test driver automates the execution of test procedures.

Test-driven development (TDD) Is an evolutionary approach to development which combines test-first development where you write a test before you write just enough production code to fulfill that test and refactoring.[Beck 2003; Astels 2003]

Comparing Test-driven development (TDD) and agile model-driven development (AMDD)

Test environment bug. Bug class indicating that some test environment is found to be insufficient to support some test type or inconsistent with its specification. [DOD STD 2167A].

Test Harness A system of test drivers and other tools to support test execution (e.g., stubs, executable test cases, and test drivers). See: test driver.

Test Inputs. Artifacts from work processes that are used to identify and define actions that occur during testing. These artifacts may come from development processes that are external to the test group. Examples include Functional Requirements Specifications and Design Specifications. They may also be derived from previous testing phases and passed to subsequent testing activities.[Daniel J. Mosley, 2002]

Test Idea: an idea for testing something.[James Bach]

Test Item. A software item which is the object of testing.[IEEE]

Test Log A chronological record of all relevant details about the execution of a test.[IEEE]

Test logistics: the set of ideas that guide the application of resources to fulfilling the test strategy.[James Bach]

Test Plan. A high-level document that defines a testing project so that it can be properly measured and controlled. It defines the test strategy and organized elements of the test life cycle, including resource requirements, project schedule, and test requirements
Test Procedure. A document, providing detailed instructions for the [manual] execution of one or more test cases. [BS7925-1] Often called - a manual test script.

Test Results. Data captured during the execution of test and used in calculating the different key measures of testing.[Daniel J. Mosley, 2002]

Test Rig A flexible combination of hardware, software, data, and interconnectivity that can be configured by the Test Team to simulate a variety of different Live Environments on which an AUT can be delivered.[Testing IT: An Off-the-Shelf Software Testing Process by John Watkins ]

Test Script. The computer readable instructions that automate the execu- tion of a test procedure (or portion of a test procedure). Test scripts may be created (recorded) or automatically generated using test automation tools, programmed using a programming language, or created by a combination of recording, generating, and programming.[Daniel J. Mosley, 2002]

Test strategy. Describes the general approach and objectives of the test activities. [Daniel J. Mosley, 2002]
Test Status. The assessment of the result of running tests on software.

Test Stub A dummy software component or object used (during development and testing) to simulate the behaviour of a real component. The stub typically provides test output.

Test Suites A test suite consists of multiple test cases (procedures and data) that are combined and often managed by a test harness.

Test technique: test method; a heuristic or algorithm for designing and/or executing a test; a recipe for a test. [James Bach]

Test Tree. A physical implementation of Test Suite. [Dorothy Graham, 1999]

Testability. Attributes of software that bear on the effort needed for validating the modified software [ISO 8402]

Testability Hooks. Those functions, integrated in the software that can be invoked through primarily undocumented interfaces to drive specific processing which would otherwise be difficult to exercise. [Scott Loveland, 2005]

Testing. The execution of tests with the intent of providing that the system and application under test does or does not perform according to the requirements specification.

(TPI) Test Process Improvement. A method for baselining testing processes and identifying process improvement opportunities, using a static model developed by Martin Pol and Tim Koomen.
Test Suite. The set of tests that when executed instantiate a test scenario.[Daniel J. Mosley, 2002]

Test Workspace. Private areas where testers can install and test code in accordance with the project's adopted standards in relative isolation from the developers.[Daniel J. Mosley, 2002]
Thread Testing. A testing technique used to test the business functionality or business logic of the AUT in an end-to-end manner, in much the same way a User or an operator might interact with the system during its normal use.[Testing IT: An Off-the-Shelf Software Testing Process by John Watkins ]

Timing and Serialization Problems. A class of software defect, usually in multithreaded code, in which two or more tasks attempt to alter a shared software resource without properly coordinating their actions. Also known as Race Conditions.[Scott Loveland, 2005]

Transient bug. A bug which is evident for a short period of time. see aperiodic bug. [Peter Farrell-Vinay 2008]

Translation testing. See internationalization testing.

Thrasher. A type of program used to test for data integrity errors on mainframe system. The name is derived from the first such program, which deliberately generated memory thrashing (the overuse of large amount of memory, leading to heavy paging or swapping) while monitoring for corruption. [Scott Loveland, 2005]

Top Down Testing Technique read a description
Unit Testing. Testing performed to isolate and expose faults and failures as soon as the source code is available, regardless of the external interfaces that may be required. Oftentimes, the detailed design and requirements documents are used as a basis to compare how and what the unit is able to perform. White and black-box testing methods are combined during unit testing.

Usability testing. Testing for 'user-friendliness'. Clearly this is subjective, and will depend on the targeted end-user or customer.
Validation. The comparison between the actual characteristics of something (e.g. a product of a software project and the expected characteristics).Validation is checking that you have built the right system.

Variance. A variance is an observable and measurable difference between an actual result and an expected result.

Verification The comparison between the actual characteristics of something (e.g. a product of a software project) and the specified characteristics.Verification is checking that we have built the system right.

Volume testing. Testing where the system is subjected to large volumes of data.[BS7925-1]
Walkthrough In the most usual form of term, a walkthrough is step by step simulation of the execution of a procedure, as when walking through code line by line, with an imagined set of inputs. The term has been extended to the review of material that is not procedural, such as data descriptions, reference manuals, specifications, etc.

Walkthroughs versus Inspections This page lists some recomendations that can be related to Walkthrough and Inspection

White Box Testing (glass-box). Testing is done under a structural testing strategy and require complete access to the object's structure¡that is, the source code.[B. Beizer, 1995 p8].

Why we consider Percentile rather than response times?


Why Percentiles are always chosen:

A percentile tells me at which part of the curve I am looking at and how many transactions are represented by that metric. To visualize this look at the following chart:



This chart shows the 50th and 90th percentile along with the average of the same transaction. It shows that the average is influenced far mor heavily by the 90th, thus by outliers and not by the bulk of the transactions

The green line represents the average. As you can see it is very volatile. The other two lines represent the 50th and 90th percentile. As we can see the 50th percentile (or median) is rather stable but has a couple of jumps. These jumps represent real performance degradation for the majority (50%) of the transactions. The 90th percentile (this is the start of the “tail”) is a lot more volatile, which means that the outliers slowness depends on data or user behavior. What’s important here is that the average is heavily influenced (dragged) by the 90th percentile, the tail, rather than the bulk of the transactions.
If the 50th percentile (median) of a response time is 500ms that means that 50% of my transactions are either as fast or faster than 500ms. If the 90th percentile of the same transaction is at 1000ms it means that 90% are as fast or faster and only 10% are slower. The average in this case could either be lower than 500ms (on a heavy front curve), a lot higher (long tail) or somewhere in between. A percentile gives me a much better sense of my real world performance, because it shows me a slice of my response time curve.


For exactly that reason percentiles are perfect for automatic baselining. If the 50th percentile moves from 500ms to 600ms I know that 50% of my transactions suffered a 20% performance degradation. You need to react to that.

In many cases we see that the 75th or 90th percentile does not change at all in such a scenario. This means the slow transactions didn’t get any slower, only the normal ones did. Depending on how long your tail is the average might not have moved at all in such a scenario!

In other cases we see the 98th percentile degrading from 1s to 1.5 seconds while the 95th is stable at 900ms. This means that your application as a whole is stable, but a few outliers got worse, nothing to worry about immediately. Percentile-based alerts do not suffer from false positives, are a lot less volatile and don’t miss any important performance degradations! Consequently a baselining approach that uses percentiles does not require a lot of tuning variables to work effectively.


The screenshot below shows the Median (50th Percentile) for a particular transaction jumping from about 50ms to about 500ms and triggering an alert as it is significantly above the calculated baseline (green line). The chart labeled “Slow Response Time” on the other hand shows the 90thpercentile for the same transaction. These “outliers” also show an increase in response time but not significant enough to trigger an alert.




Here we see an automatic baselining dashboard with a violation at the 50th percentile. The violation is quite clear, at the same time the 90th percentile (right upper chart) does not violate. Because the outliers are so much slower than the bulk of the transaction an average would have been influenced by them and would not have have reacted quite as dramatically as the 50th percentile. We might have missed this clear violation!

How can we use percentiles for tuning?

Percentiles are also great for tuning, and giving your optimizations a particular goal. Let’s say that something within my application is too slow in general and I need to make it faster. In this case I want to focus on bringing down the 90th percentile. This would ensure sure that the overall response time of the application goes down. In other cases I have unacceptably long outliers I want to focus on bringing down response time for transactions beyond the 98th or 99th percentile (only outliers). We see a lot of applications that have perfectly acceptable performance for the 90th percentile, with the 98th percentile being magnitudes worse.

In throughput oriented applications on the other hand I would want to make the majority of my transactions very fast, while accepting that an optimization makes a few outliers slower. I might therefore make sure that the 75th percentile goes down while trying to keep the 90th percentile stable or not getting a lot worse.

I could not make the same kind of observations with averages, minimum and maximum, but with percentiles they are very easy indeed.

Conclusion
Averages are ineffective because they are too simplistic and one-dimensional. Percentiles are a really great and easy way of understanding the real performance characteristics of your application. They also provide a great basis for automatic baselining, behavioural learning and optimizing your application with a proper focus. In short, percentiles are great!

Why Average decreasing and Percentiles increasing?(Load Runner and Performance Testing)

Why Averages Suck and Percentiles are Great?

Anyone that ever monitored or analyzed an application uses or has used averages. They are simple to understand and calculate. We tend to ignore just how wrong the picture is that averages paint of the world. To emphasis the point let me give you a real world example outside of the performance space that I read recently in a newspaper.

The article was explaining that the average salary in a certain region in Europe was 1900 Euro’s (to be clear this would be quite good in that region!). However when looking closer they found out that the majority, namely 9 out of 10 people, only earned around 1000 Euros and one would earn 10.000 (I over simplified this of course, but you get the idea). If you do the math you will see that the average of this is indeed 1900, but we can all agree that this does not represent the “average” salary as we would use the word in day to day live. So now let’s apply this thinking to application performance.


The Average Response Time:


The average response time is by far the most commonly used metric in application performance management. We assume that this represents a “normal” transaction, however this would only be true if the response time is always the same (all transaction run at equal speed) or the response time distribution is roughly bell curved.




A Bell curve represents the “normal” distribution of response times in which the average and the median are the same. I rarely ever occurs in real applications

In a Bell Curve the average (mean) and median are the same. In other words observed performance would represent the majority (half or more than half) of the transactions.
In reality most applications have few very heavy outliers; a statistician would say that the curve has a long tail. A long tail does not imply many slow transactions, but few that are magnitudes slower than the norm.




This is a typical Response Time Distribution with few but heavy outliers – it has a long tail. The average here is dragged to the right by the long tail.

We recognize that the average no longer represents the bulk of the transactions but can be a lot higher than the median.


You can now argue that this is not a problem as long as the average doesn’t look better than the median. I would disagree, but let’s look at another real-world scenario experienced by many of our customers:




This is another typical Response Time Distribution. Here we have quite a few very fast transactions that drag the average to the left of the actual median

In this case a considerable percentage of transactions are very, very fast (10-20 percent), while the bulk of transactions are several times slower. The median would still tell us the true story, but the average all of a sudden looks a lot faster than most of our transactions actually are. This is very typical in search engines or when caches are involved, some transactions are very fast, but the bulk are normal. Another reason for this scenario are failed transactions, more specifically transactions that failed fast. Many real world applications have a failure rate of 1-10 percent (due to user errors or validation errors). These failed transactions are often magnitudes faster than the real ones and consequently distorted an average.

Of course performance analysts are not stupid and regularly try to compensate with higher frequency charts (compensating by looking at smaller aggregates visually) and by taking in minimum and maximum observed response times. However we can often only do this if we know the application very well, those unfamiliar with the application might easily misinterpret the charts. Because of the depth and type of knowledge required for this, it’s difficult to communicate your analysis to other people – think how many arguments between IT teams have been caused by this. And that’s before we even being to think about communicating with business stakeholders!


A better metric by far are percentiles, because they allow us to understand the distribution. But before we look at percentiles, let’s take a look a key feature in every production monitoring solution: Automatic Baselining and Alerting.
Automatic Baselining and Alerting


In real world environments, performance gets attention when it is poor and has a negative impact on the business and users. But how can we identify performance issues quickly to prevent negative effects? We cannot alert on every slow transaction, since there are always some. In addition, most Operations teams have to maintain a large number of applications are not familiar with all of them, so manually setting thresholds can be inaccurate, quite painful and time consuming.


The industry has come up with a solution called Automatic Baselining. Baselining calculates out the “normal” performance and only alerts us when an application slows down or produces more errors than usual. Most approaches rely on averages and standard deviations.


Without going into statistical details, this approach again assumes that the response times are distributed over a bell curve:




The Standard Deviation represents 33% of all transactions with the mean as the middle. 2xStandard Deviation represents 66% and thus the majority, everything outside could be considered an outlier. However most real world scenarios are not bell curved…

Typically, transactions that are outside 2 times standard deviation are treated as slow and captured for analysis. An alert is raised if the average moves significantly. In a bell curve this would account for the slowest 16.5 percent (and you can of course adjust that), however if the response time distribution does not represent a bell curve it becomes inaccurate. We either end up with a lot of false positives (transactions that are a lot slower than the average but when looking at the curve lie within the norm) or we miss a lot of problems (false negatives). In addition if the curve is not a bell curve than the average can differ a lot from the median, applying a standard deviation to such an average can lead to quite a different result than you would expect! To work around this problem these algorithms have many tunable variables and a lot of “hacks” for specific use cases.