Monday, December 22, 2014

How to create a realistic load test

For years I come across many existing load tests or people who try to build a new load test and I find them to build an unrealistic load test, which generate unrealistic load on the target application or System Under Test (SUT).

The problem

I come across load tests which generate either really low or really high loads on the SUT, which both are not what the performance tests were planned to do.
With too low load on the system, you are not putting enough pressure on it, thus you won't see the problems you are looking to find, before they become visible on production, for your real users.
With too high load on the system, you will see problems which may or may not occur in real life. Thus if you are in a hurry, or need to clearly prioritize which issue is a blocker and which can be dealt with later, you'll have a problem.

An even worse type of issue I see with load tests is where they generate an unexpected load on the SUT. Seriously!
People come up with complex and wrong scenarios, mostly because the scenario is based on 'business' view of how the load test should look like. They try to build a scenario based on business perspective and figures such as:
We have a potential of 100,000 users per day, let's build a load test for simulating their behavior. 10% will create content, 20% will like content and 70% will view content.
Now the poor guys who need to build that load test says, "OK, I will build a scenario with three types of users, one user type (10,000 users in total) is doing login, viewing some content, then creating some content and logout. Second type (20,000 users in total) will login, view a piece of content, like it and logout. The third type (70,000 users in total) will login, view a content and logout."

Make sense right? Well, hell no!
What you end up with is with two main problems:
  1. You have a load test script with REAL 100,000 threads (and let me explain why that's a bad idea, below).
  2. You have a load test which generate an unexpected load or in other words you have no idea what kind of throughput is actually generated against the SUT. I'll explain that below too.

Real 100,000 threads for load testing

Unless you are working for Facebook, Twitter or Google there is no way you need 100,000 real clients (threads) hitting your system at once. Do you have 100,000 requests per second (real requests for content/actions, I'm not talking about hits which include static resources)? Probably not. That means you don't need 100,000 concurrent / parallel threads for generating the required load on your system.
Not mentioning that you may end up with complex load test setup with unnecessary amount of load generators as well as session time out issues as most of your threads / virtual users will wait for such a long time, that the corresponding application session may time out until their next iteration will take place.

Unexpected throughput

Having a summary report that claims that the system supports 100,000 users, doesn't mean anything.
Someone provided you with a load test results report, with the scenario described above, with 10,000 users which created new content, 20,000 users which liked content and 70,000 users which viewed content, over 8 hours. What does that mean? Well, not much really.

Why not? Because you don't know what was the generated throughput. Was it generating 100 likes per second or 1 per second? How many content views we had per second? Was the workload constant with the generated traffic or generated lots of spikes where some intervals had 10 times more load than the others?

Usually with such approach you will not have a good control on those figures, as you will try to mimic a real user flow, with think times between interactions, which you may believe they reflect a realistic human behavior. While your behavioral assumptions may be true or completely wrong, the given fact is that this kind of realistic behavior will not create the realistic load on the system and that's what you care about - "Will my system handle the load?".

A more scientific approach would be to build the load test to generate configurable throughput on different types of features or activities in the system.
For example you would build the following load test scenario, which is easier to monitor and measure:
100 content views per second
10 logins per second
10 create content per second
10 likes per second
1 logouts per second

In total this will generate 131 requests per second (depending on the actual application, you may end up with more requests as you may need to generate additional requests for loading the content editor before you actually submit/publish it or if you have AJAX calls with every content view than you should generate those too).

With such approach you are generating constant and controlled throughput on your SUT. You can configure each type of activity to generate a different throughput to reflect realistic usage patterns.

Sessions

In terms of amount of users, well, usually it doesn't matter, as the only thing we care about in terms of load from the SUT perspective, is most likely amount of sessions, which bound to memory. If you want to have 100,000 sessions in any given time, you can also take care for that with this approach, so you will cover the expected memory usage. To do that - you'll need to generate enough logins or requests without active session, which both may create new sessions on the SUT.
Assuming session timeout of 1 hour in the SUT, you should generate 100,000 session-less requests per hour or 100,000 / 60 / 60 = 27.7 per second. This means that it should be enough to make only 21% of the total requests to trigger new session on the server side and you will generate the required amount of sessions (21% out of 131 requests per second are 27.5 but you get the point).

Bottom Line

So to sum things up, building a realistic load test scenario doesn't mean you should have a realistic behavior from a single end user perspective (i.e. a realistic flow in the SUT) but rather you should build the load test in a way that the absolute minimal set of steps are done by each thread to allow the generation of the required realistic workload / throughput. I.e. a user cannot like content if he hasn't first login, so you'll need to consider that. But you don't need to create a user that does everything, or interact with a set of features like a real user, because than it is hard to control the generated throughput and you end up with unexpected load being generated against the SUT.

So how many threads you actually need to have in your load testing tool to generate this required load? This is fairly simple - assuming you set a response timeout of 10 seconds (in which the load testing tool will consider the request as a failure and allow the thread to continue to work), you'll need to promise that you have enough threads to generate 131 requests per second while some requests will take up to 10 seconds to finish. The calculation here is that a single thread can generate 6 requests per minute in the worst case or one request per 10 seconds. We need 1310 requests per 10 seconds, so we need up to 1310 threads in the worst case to promise generating the required load, no matter how responsive or slow the SUT gets. (the calculation is: Amount of required RPS * Maximal response time = Required threads)

1310 threads instead of 100,000 threads is much easier to work with, isn't it?