Optimization For Parallelization

First, try to say the title of this post five times fast. If nothing else, it makes for a new tongue twister. It may sound silly, but that little exercise draws attention to the notion that for many companies, testing can be challenging and time consuming – along with being difficult to say quickly.

But testing shouldn’t be difficult to implement, and it definitely shouldn’t be slow.

One way to implement and speed up testing is to run all your tests in parallel, rather than sequentially. As I work to create “testing as a service” features for Rackspace, and hopefully for our test automation community at large, I talk to more and more testers at hackdays, meetups and conferences. I’m struck by how often people craft their automation test suites in ways that cannot be parallelized. Parallelization needs to be one of your design strategies at the outset of test development. Also, I could have (or should have) titled this “Parallelization and Stability” because focusing on the former can actually improve the latter, along with the obvious benefit of making testing fast (and scalable)!

Please note: The testing I focus on at the moment is front-end Selenium work, but these methodologies can be utilized on any kind of integration or higher level testing.


Why is parallelization important in the first place?

Imagine being able to run 10,000 test modules in one minute. That is the power of parallelization.

This isn’t something that only huge companies with deep pockets can do. Cloud computing puts parallelization very much within reach and makes it more cost efficient. If you have access to just a few servers, you can configure continuous integration tools like Jenkins to build your code and run your tests in parallel. With just a little parallelization, you can drastically cut down on the development feedback loop, thereby saving hundreds of hours of wasted time and effort.

For example, we build and deploy the Rackspace Control Panel as soon as a developer checks in code and we immediately run a suite of integration tests against it. If the tests were to run sequentially, they would take almost eight hours to run. By parallelizing the tests, they run in 15 minutes (and will be even faster soon). The benefits are obvious: we don’t waste developer time waiting for builds or test results; we get feedback about code changes sooner; and we can run these tests many times a day, not just once a night.


So why doesn’t everybody do this?

I have several theories, but here’s one that you might identify with: you start with a few tests and life is awesome; your test suite is small and you see no need for parallelization; your test suite grows and you unwittingly build it in such a way that makes parallelization impossible; months later, you have a huge test suite that takes too long and is now impossible to pull apart. You’re stuck!

One pitfall can result from writing tests that rely upon each other. To illustrate, if test module A creates something (let’s say a user), test module B will do something with that user. It seems like such a reasonable methodology. The resulting user is shiny and new, with no cruft from previous test runs. It exercises the entire stack and is a very useful set of tests.

But what if something goes sideways during user creation in test module A? Now the subsequent test module B that requires a user will falsely report failure. This type of “in sequence” testing is very brittle and to be avoided at all costs, as it sets up a “boy who cried wolf” scenario for testing, and eventually developers and product managers will cease to care about our test automation because it is flakey and unreliable.

Another perceived pitfall is cost (but I disagree with that perception). People mistakenly believe that they will need to immediately outlay mountains of cash to spin up hundreds of servers to configure and enable a parallel test infrastructure. First off, as mentioned above, if you start out small and build on existing infrastructure, your cost outlay will be gradual and dispersed over time. And second, in reality there doesn’t need to be an enormous number of servers provisioned for parallel testing. It can be done with a surprisingly small number of machines (four servers could get you a Jenkins master/slave with 10 executors — so 10 jobs running in parallel and two Selenium servers — a hub and a node — that could run 10 to 15 browser instances — that gets you 10 parallel browser tests at a time).

I can’t come up with any more pitfalls to parallelization, but if you can think of any others, let me know.

Parallelization Mantras

So you’re ready to give parallelization a go? Here are some mantras you and your team can repeat as you dig in.

  • Do it from the beginning! — Unless you start by doing parallelization from the outset, the temptation to cut some corner will be too great. The weight of refactoring your test suite might be too overwhelming later
  • Perform independent and idempotent tests
  • Write small, discreet test modules that can be run on their own
  • Use data sets that can be created, modified and deleted as directly as possible — Use direct database manipulation if possible and API scripting if necessary
  • Remove data creation and cleaning as a source of error in testing
  • Test modules should be designed to run in random order
  • One order of magnitude is huge — Moving from one process (aka job, thread, test runner) to 10 simultaneous processes won’t require too much extra infrastructure, but it can cut down a too-long test suite to something that can run within CI


Parallelization is one thing that is made easier when you build in a cloud environment…like our cloud at Rackspace!

Remember, while the title of this post “Optimization for Parallelization” might be tough to say, it’s probably even harder to say “We’re still waiting on the tests to finish.”