Monday, December 22, 2014

How to create a realistic load test

For years I come across many existing load tests or people who try to build a new load test and I find them to build an unrealistic load test, which generate unrealistic load on the target application or System Under Test (SUT).

The problem

I come across load tests which generate either really low or really high loads on the SUT, which both are not what the performance tests were planned to do.
With too low load on the system, you are not putting enough pressure on it, thus you won't see the problems you are looking to find, before they become visible on production, for your real users.
With too high load on the system, you will see problems which may or may not occur in real life. Thus if you are in a hurry, or need to clearly prioritize which issue is a blocker and which can be dealt with later, you'll have a problem.

An even worse type of issue I see with load tests is where they generate an unexpected load on the SUT. Seriously!
People come up with complex and wrong scenarios, mostly because the scenario is based on 'business' view of how the load test should look like. They try to build a scenario based on business perspective and figures such as:
We have a potential of 100,000 users per day, let's build a load test for simulating their behavior. 10% will create content, 20% will like content and 70% will view content.
Now the poor guys who need to build that load test says, "OK, I will build a scenario with three types of users, one user type (10,000 users in total) is doing login, viewing some content, then creating some content and logout. Second type (20,000 users in total) will login, view a piece of content, like it and logout. The third type (70,000 users in total) will login, view a content and logout."

Make sense right? Well, hell no!
What you end up with is with two main problems:
  1. You have a load test script with REAL 100,000 threads (and let me explain why that's a bad idea, below).
  2. You have a load test which generate an unexpected load or in other words you have no idea what kind of throughput is actually generated against the SUT. I'll explain that below too.

Real 100,000 threads for load testing

Unless you are working for Facebook, Twitter or Google there is no way you need 100,000 real clients (threads) hitting your system at once. Do you have 100,000 requests per second (real requests for content/actions, I'm not talking about hits which include static resources)? Probably not. That means you don't need 100,000 concurrent / parallel threads for generating the required load on your system.
Not mentioning that you may end up with complex load test setup with unnecessary amount of load generators as well as session time out issues as most of your threads / virtual users will wait for such a long time, that the corresponding application session may time out until their next iteration will take place.

Unexpected throughput

Having a summary report that claims that the system supports 100,000 users, doesn't mean anything.
Someone provided you with a load test results report, with the scenario described above, with 10,000 users which created new content, 20,000 users which liked content and 70,000 users which viewed content, over 8 hours. What does that mean? Well, not much really.

Why not? Because you don't know what was the generated throughput. Was it generating 100 likes per second or 1 per second? How many content views we had per second? Was the workload constant with the generated traffic or generated lots of spikes where some intervals had 10 times more load than the others?

Usually with such approach you will not have a good control on those figures, as you will try to mimic a real user flow, with think times between interactions, which you may believe they reflect a realistic human behavior. While your behavioral assumptions may be true or completely wrong, the given fact is that this kind of realistic behavior will not create the realistic load on the system and that's what you care about - "Will my system handle the load?".

A more scientific approach would be to build the load test to generate configurable throughput on different types of features or activities in the system.
For example you would build the following load test scenario, which is easier to monitor and measure:
100 content views per second
10 logins per second
10 create content per second
10 likes per second
1 logouts per second

In total this will generate 131 requests per second (depending on the actual application, you may end up with more requests as you may need to generate additional requests for loading the content editor before you actually submit/publish it or if you have AJAX calls with every content view than you should generate those too).

With such approach you are generating constant and controlled throughput on your SUT. You can configure each type of activity to generate a different throughput to reflect realistic usage patterns.

Sessions

In terms of amount of users, well, usually it doesn't matter, as the only thing we care about in terms of load from the SUT perspective, is most likely amount of sessions, which bound to memory. If you want to have 100,000 sessions in any given time, you can also take care for that with this approach, so you will cover the expected memory usage. To do that - you'll need to generate enough logins or requests without active session, which both may create new sessions on the SUT.
Assuming session timeout of 1 hour in the SUT, you should generate 100,000 session-less requests per hour or 100,000 / 60 / 60 = 27.7 per second. This means that it should be enough to make only 21% of the total requests to trigger new session on the server side and you will generate the required amount of sessions (21% out of 131 requests per second are 27.5 but you get the point).

Bottom Line

So to sum things up, building a realistic load test scenario doesn't mean you should have a realistic behavior from a single end user perspective (i.e. a realistic flow in the SUT) but rather you should build the load test in a way that the absolute minimal set of steps are done by each thread to allow the generation of the required realistic workload / throughput. I.e. a user cannot like content if he hasn't first login, so you'll need to consider that. But you don't need to create a user that does everything, or interact with a set of features like a real user, because than it is hard to control the generated throughput and you end up with unexpected load being generated against the SUT.

So how many threads you actually need to have in your load testing tool to generate this required load? This is fairly simple - assuming you set a response timeout of 10 seconds (in which the load testing tool will consider the request as a failure and allow the thread to continue to work), you'll need to promise that you have enough threads to generate 131 requests per second while some requests will take up to 10 seconds to finish. The calculation here is that a single thread can generate 6 requests per minute in the worst case or one request per 10 seconds. We need 1310 requests per 10 seconds, so we need up to 1310 threads in the worst case to promise generating the required load, no matter how responsive or slow the SUT gets. (the calculation is: Amount of required RPS * Maximal response time = Required threads)

1310 threads instead of 100,000 threads is much easier to work with, isn't it?

Thursday, July 24, 2014

Why all load testing tools are the same

Over the last few years I've noticed more and more new load testing services and tools, but especially cloud-based SaaS solutions, as part of the movement towards "everything as a service".

There are so many to chose from: Load Impact, BLITZ.IO, BlazeMeterloader.io, Load Storm, Load Focus, SOASTA, LoadUI, Locust and much more, each has its pros and cons, but I'm getting really mad with the fact that they are just more of the same.

Why do I say that they are all more of the same? You need to go back to early 90's where an Israeli company was one of the first to come with commercial load testing tool for WEB and few other protocols, this tool was (Win/) Load Runner and the company named Mercury (later purchased by HP).
Since early 90's the technology of load testing was not changed at all, it was and still is about simply sending HTTP requests and measure responses (I'm being focused on WEB/HTTP load testing for simplicity).

Now, over time, Mercury created good stuff to make things easier for getting the work done, mostly with auto-correlation of parameters and predefined macros for known applications and platforms where it was complicated to develop the load testing scripts due to application complexity with massive usage of http parameters being send back and forth between the browser and the server.
(and I ignore for the scope of the post, all of the other good stuff they've done with systems monitoring, application insight monitoring aka diagnostics, and bringing several perspectives into single powerful analysis tool which was, in my opinion, technological break-through)

At some point, somewhere around 2010, where AJAX was getting popular, generating load testing scripts got more complex and one of the ways to deal with it was to try and change the way load testing was done until that point. If up until now it was all about sending HTTP requests and measure responses (with some parsing), the change was to try and run kinda of real browsers, in memory, without UI, so the load testing tool is now starting to run UI-less browsers and manage them (you could imagine Selenium headless browser running multiple times). This makes it easy to deal with frequent changes in the application UI, API and any changes with AJAX calls will simply require no change with the load testing scripts, as the scripts now interacting on the UI level. Sounds great, right? Well..

Basically the main down side of this technology is that while it works great for functional testing where a single user, or maybe few, running from a host to test the application, the problem is that it is extremely Memory and CPU intensive to run a real browser with MANY windows. So in practice when it comes to really load you web application with more than tens or few hundreds of users, such approach is simply too expensive. Indeed memory and CPU in general are getting cheaper, and maybe in few short years we will get there, but you still need about 100MB (gross) of memory for every virtual user, depending on the complexity of the client side (css/js). So for intranet / internal web systems - it might work, you can create few load generator hosts which with 8-16GB RAM can create a load for about 80-160 virtual users, where it might be enough to load test such internal systems.

When it comes to load testing big, Internet facing applications, you would probably want to load test with thousands of users, which based on the gross numbers above, will require 1000GB of RAM for 10,000 concurrent users. Based on Amazon pricing this will cost about 32$ USD per hour (based on c3.8xlarge 60GB instance which is about 1.7$ per hour - I'm rounding up to 2$ as there are other expenses like network and disk usage).

Now, given the fact that you never run one load test and then wrap things up and never do that again.If you are serious, you will run load tests regularly and at least, once before every production release. We are talking about hours of load tests every month.
With one of my clients, we run 8 hours of load tests every day and 24 hours load test every weekend, with a similar scale to the example above - it would cost more than 9000$ per month to run load tests with real / headless browsers (and that's assuming that we use Amazon On-Demand instances and shut them down between load tests).

If you can have such budget spent on load testing infrastructure alone, you should also consider the fact that, such approach with driving a headless browser, is still considered inaccurate and inconsistent. The timing mechanism is still considered immature, virtual browsers may affect each other due to spikes with CPU consumption and thus running same tests may get you with pretty different results from my experience.

So now, I come back again to why I say all load testing solutions are more of the same.
All of those services and tools I've mentioned above, focus on running load tests with the very first technology that Mercury came with in early 90's, not with running real browsers, which is still immature and expensive but with scripts that define what kind of HTTP requests to hit the system with and measure the response, that's all. So they all provide with different scripting languages or ways to control the test. Some allow you to work with UI and modules to create the required script, some with XML, some with coding style in either proprietary scripting engine or wide spread scripting engine, but the very bottom line is that they all ask you to provide with same data to get your load test running.

Go and try few of them, you will see and understand that each solution wraps the same idea with different UI. They all work the same and if they work the same, I urge you to use those who are based on JMeter. Why? Because it is the most popular load testing tool in the world, it is free and open source and most importantly - it has the biggest set of features and it continue to grow with regular releases backed by an awesome core team of commiters which push it forward. If you go with other proprietary scripting engine, you will soon find show stoppers, blocking you from executing a script which should interact with your application due to missing functionality, as all other engines are trying to keep up with JMeter.

So why I mad about all those tools and competitors? They all try to re-invent the same wheel. Not a smarter one, not a better shape nor better material. Just re-creating same old stuff which was invented about quarter of a century ago!!

Last point is that 5 years ago - the idea of driving real/headless browsers was really promising, but so far no real solution is doing that successfully.
I'll create a technical post on this topic in the future to show timing issues with such approach, currently I have in mind to show results from Selenium browser running from by JMeter with the JMeter Plugins, but any other ideas or pointers are welcome here.