Performance testing in different environments
I tend to think there is value in performance testing in different environments. To be clear about what I mean when I say performance testing, I mean uncovering information about the product related to load, timing, stress, etc....
For the purpose of this post, imagine a company that has the following environments in which the application can live and in which I can test:
- Local: an individual machine
- Development: the place where developer integrate and test
- Test: the place where testers test
- Stage: the pre-production environment where final testing takes place, including traditional performance testing
I find that performance testing in different environments helps me in several ways:
- Learning about performance earlier
- Learn about performance regression earlier
- Learning about performance constraints earlier
- Learning how the team adapts to feedback earlier
Learning about performance earlier
I like to test the product as soon as it exists - including performance. Testing performance in early builds and in early environments (like a Local environment or a Development environment) can indicate to me possible performance risks that might direct the testing I do later in the project.
To steal from Scott Barber, I like to investigate in early environments, not validate. I'm not looking for bugs (if I find one, great); I'm looking for information that will help guide my future testing, will help me develop an idea of different risks associated with performance, or will help me understand how the performance relates to the functionality (error messaging, failover/recovery, timing boundaries, race conditions, etc...).
The more I can learn and the earlier I can learn it, regardless of if we hit our target response time, based on a target load, with a specific transaction or set of transactions, in a specific operating environment, the better. I just want to know what I'm working with so I can better plan my future performance testing and develop a better - more informed - set of functional tests.
Learning about performance regression earlier
On the Agile-Testing Yahoo group, Paul Arrowood offered the following:
I was thinking that we could time some standard queries. We'd have to time this for all test environments (DEV, INT, CERT), mainly because the closer we get to PROD, typically the faster our boxes perform. So...
* DEV retrieving 50 search results takes 200 mms
* INT retrieving 50 search results takes 100 mms
* CERT retrieving 50 search results takes 50 mms
What I was thinking is that we could add x% to these numbers, and create a repeatable automated test that would time this. When it runs longer (out of
tolerance), it fails. Of course, operating in test environments lends itself to a myriad of reasons (less so in a more static CERT environment) why something ran slower. But the thinking is that at least there'd be an early indication that "something" recently happened which may have caused a standard search to take 100% longer this time. It suggests to investigating whether code just checked in caused this, or maybe it's just another process running on the box (and there's really no issue). It also moves responsibility into the project team to measure, anticipate and react to performance shifts as early as possible.
I think this is an excellent practice. Any time you can develop those benchmarks, you can use them as early warning systems for performance regressions. Will they always point to a performance problem? Not at all, but that doesn't mean they aren't valuable (see Learning about performance constraints earlier below).
When you do find an actual regression due to a change, you typically have more visibility and debug features available in those early environments than you do in the Stage (or in Paul's case CERT) environments. You can turn on logging, runtime analysis tools, and other tools to help track down the problem. I've been in several environments where you can't do that in the Stage environment. Something that takes thirty minutes to find in a Development environment may take ten hours to find in a Stage environment. You may not have access to builds with instrumented code or you may not have access to the people who actually own those environments if you need to change a configuration.
I also like Paul's to add a tolerance. I've not done that yet because my performance testing tends to be more manual in those environments, but I can envision someone using JUnitPerf or some similar tool to automate that type of analysis with the unit tests.
Learning about performance constraints earlier
The main reason I do performance testing early in Local and Development environments is so I can very easily identify performance constraints. This is because Local, Development, and Test environments are often hostile environments. They have low system resources (memory, processing power, etc...), databases that are close to capacity, third-party services are not available, and all sorts of other great limitations that we often have to fight for in a Stage environment. Not only do these tests tip me off to where we may have performance issues (a query on a database at capacity), but these tests are also very rich tests in terms of the application functionality under stress.
These constraints may then be used to develop other tests around identified problem points. The results may expand or contract my final set of validation tests that I'll run in the Stage environment. They may also uncover new areas for clarification in terms of performance requirements. You will most likely value uncovering these issues early in the project rather than later in the project.
Learning how the team adapts to feedback earlier
I'm not a developer, I'm a tester. So a final value in running performance tests early in the Local and Development environments is that it gives me a chance to interface with the developers, in a technical capacity, early in the project. I find that this is a useful activity in building my "street credit" as a tester worth working with. This interaction gives me two advantages: one is it develops trust between me and the developer, the other is the ability to provide quality related feedback on the application earlier than later. I'll find out very quickly if the developers are open to this type of communication.
I've yet to find one developer who isn't at least interested in what I find. They may not have the capacity or authority to make changes at that stage in the project, but they at least know about a potential issue lurking in the background and they can take action on their end. This is known as "MIPing" or mention in passing; where my communication with them is a low cost way of getting attention on a possible issue.
At the end of Paul's post on the Agile Testing list, he asks the following:
* Is this standard? Do people do this on Agile projects? If not, how else do you address 'performance' early in your agile projects?
* What tools are used for timing such a thing? (haven't investigated perf-unit yet)
* How do you avoid spending countless hours constantly investigating performance anomalies that end up always being valid (i.e. don't do this on DEV)?
- I don't know if it's standard; based on my experience I doubt it. And I don't know about Agile projects since I've not worked on any "capital A" Agile projects. I seem to remember talking to both Antony Marcano and Neill McCarthy about this at one point where they may have done this on Agile projects.
- I do this testing with all sorts of tools, including: SOAPscope, soapui, Junit, Watir, IBM Rational Robot, Mercury VUGen, and probably a couple of others I can't remember.
- You can avoid spending countless hours constantly investigating performance anomalies the same way you avoid spending countless hours constantly investigating any potential bug. You can add to a defect watch list, time box the investigation, or wait until you are in a position where you can get more information on the problem more easily (build in a testability feature, new tools, or different environment). Like any problem, how much time you spend on it is based on the risk of it occurring in production, how bad are the effects of the problem, and all the other aspects of risk that we look at when evaluating and prioritizing our defects.