← Released django-hotrunner | 2012-11-29 →

The Scourge of Slow Tests (and What to Do About It)

One of the best ways to improve your development process is to pay attention to how the process is currently working, and where the points of pain are. This is a complex process that involves everything from discussing how developers feel about the process, to understanding when and how communication happens, and which of those channels are the most effective to collecting hard data on where time is being wasted.

On the engineering team at MetaMetrics, where I work, one of recurring pain-points is the performance of our test suites. On at least two of our projects, a full run of the test suite can take upwards of fifteen minutes. This can lead to a number of outcomes, none good. I'll go through a few of them here, and then explain what we're doing about it and how it helps.

Problems

Some of the major problems that I've seen arise from slow test suites are:

  1. Sitting around waiting for the test suite to finish
  2. Not running the test suite often enough
  3. Only running part of the test suite
  4. Not writing tests

This is by no means an exhaustive list, but it does hit some of the biggest problems we've encountered in our projects.

Sitting around waiting for the test suite to finish

When developers have to wait fifteen minutes for a test suite to run, they are not being productive. They're browsing reddit, they're checking their email, possibly they're just watching the little dots go by on the screen. At best (maybe) they're working on some other project. None of these are necessarily bad. Obviously no developer can be productive all the time. The problem is that they are changing gears at a time when they should be staying on track. Most of the time, after you run tests, the next thing you should be doing is a continuation of the thing you were doing before you ran the tests--fixing the thing that caused a test to break, implementing the next chunk of the feature you're working on, etc. Moving on to something else while the test suite runs means that your mind is somewhere else, and it takes time to get back on track when the test suite is finished.

Not running the test suite often enough

Another likely scenario is that you won't run your tests as often as you should. You'll try to get the whole feature in place, and ten tests written for it, and then run the tests, so that you only have to run it once. This means that the problems that come up will likely be larger, and it will take work to pin down just where they happened and why. The more we can localize problems (preferably without having to think about it), the easier it is to fix them and move on.

Only running part of the test suite

Developers who are aware of the above issues will often just run a small part of the test suite as they go along. This will speed up the testing cycle and help keep developers on track, but it means that some problems can end up hidden if they manifest far enough away from the code change, which means that commits are more likely to break the build.

Not writing tests

Finally, the frustration of the process can cause developers not to write tests at all. If it's slowing down development, developers will feel like they are wasting their time. The test suite will start to feel like an obstacle rather than a tool. Tests won't get written. The suite will get out of date. Developers will start to expect certain tests to fail: "Oh yeah, that test always fails. Don't worry about it." Pretty soon the whole suite becomes suspect, and stops providing useful information.

Keeping the tests fast keeps them useful, which helps us all produce quality code.

Solutions

The main solution, of course, is to write faster tests. We've been working on just this thing in our latest project, and we've come up with a few things that help.

  1. Know what is slow
  2. Test what you need to test, and nothing else
  3. Only test your own work
  4. Don't hit the database more than you need to
  5. Isolate slow tests (but make sure they get run)

Some of these are just a matter of paying attention to how you are writing tests, but others require new tools to support them.

We've written a custom test runner called django-hotrunner to solve some of these problems for us. It is a work in progress, but we hope to make it into the best test runner in the django ecosystem.

Know what is slow

Always resist the temptation to guess what is slow. Bottlenecks can often be surprising, and it's better to measure than to guess. When a test suite runs slowly, the first thing to do is figure out which tests are slowing it down. Are there one or two slow tests that need to be rewritten, or are all of the tests running sluggishly, exposing a more systematic problem?

One thing I've always found frustrating is that there is no way, out of the box, to find out how long a particular test takes. You can see the running time of one test by running it on its own, as the running time of the whole test suite is printed out at the end, but that requires that you already know which test is slow. It also only shows the running time down to the millisecond, which is not enough for really understanding our tests.

In order to find the slow tests to begin with, you have to run the suite with high verbosity and try to pick out the names of the slowest tests as they go by. Not a very accurate or rewarding process.

Profiling using python -mcProfile manage.py test is another option, and it is a great tool, but the granularity is perhaps too fine there; the output is overwhelming. I want to know which tests are slow, and then perhaps profile them individually.

Naturally this was one of the first things I wanted to do with django-hotrunner when I wrote it. Running it with a verbosity of 2 or higher, you can see the running time of each individual test, down to the microsecond. We'll be using this style of output in our demo code below.

Test what you need to test, and nothing else

Unit tests. Functional tests. Acceptance tests. Integration tests. Load tests. Benchmarks. We need them all, but to different degrees.

Most of the tests you write should be fast. Test all the little bits of a given chunk of functionality, then write one or two tests that test the entire thing. We've recently taken to making this separation explicit by moving slow tests into a separate django app. Usually, they're testing interactions between apps anyhow.

Make sure your unit tests are as comprehensive as possible. If your unit tests provide full or near-full coverage, you can write fewer functional and integration tests.

If you are using Django, don't test views with the test client. This will call the view from "outside" django, which means that in addition to testing your view, you are also testing Django's request processing, including all of your activated middlewares every single time you hit a view. Why? This might be justified once in a blue moon, when there are some intricate interactions between middlewares and views. But most of the time, you should just call the view as a function.

Recent versions of Django (starting with 1.3) provide a RequestFactory, which will create request objects for you to pass to your views without building up a bunch of extraneous functionality. With earlier versions of Django, you had to create an imitation request object manually.

I set up a quick comparison, using a simple view that renders a template with a couple variables, like this:

class ViewTestPerformanceTestCase(TestCase):
    def test_hit_home_page_as_view(self):
        response = self.client.get(reverse('home'))
        self.assertEqual(response.status_code, 200)

    def test_hit_home_page_as_function(self):
        request = RequestFactory().get('/')
        response = views.home(request)
        self.assertEqual(response.status_code, 200)

Running these tests (with django-hotrunner's test timing turned on), gives the following results:

(venv)user@host:~/proj/venv/project $ ./manage.py test --verbosity=2
Creating test database for alias 'default' (':memory:')...
Creating tables ...
...
test_hit_home_page_as_function (app.tests.ViewTestPerformanceTestCase) ... (0.000773s) ok
test_hit_home_page_as_view (app.tests.ViewTestPerformanceTestCase) ... (0.103669s) ok

----------------------------------------------------------------------
Ran 2 tests in 0.108s

OK

It took less than 800 microsecods for calling the view as a function, and over 100000 microseconds to call it with the test client. That's over two orders of magnitude. It's the difference between a thirty second test suite and a one hour test suite. Not that we would expect to see those sorts of improvements sustained over a real test suite, but it will add up to a big difference, as views are tested extensively in most web apps.

Working with class-based views, you can easily isolate parts of complex views into helper methods and test those methods directly, to further isolate functional units.

The other important thing is not to hit external services. You don't want to have to wait on an unreliable network (or any netweork) when waiting for a test to finish. That said, you also don't want the service you depend on to disappear, and not find out about it until a customer complains. There are a number of potential ways to monitor external services, but if we're just talking about the test suite here, you can write a single integration test that hits the live service. If the service is down, or responding abnormally, you'll be notified with a failing test. Hitting it once is much better than hitting it with every test related to that service.

Only test your own work

Django's test runner tests every app you have installed. In a normal project, many of those apps weren't even code you wrote. When you create a django project, it comes with several django.contrib apps enabled by default, and most projects will immediately turn on django.contrib.admin, and possibly django.contrib.admindocs (though admindocs is rather rusty at this point). With these turned on, and without writing a single line of test code, or including a single bit of homegrown code, running ./manage.py test takes 80 seconds on my laptop. That's a lot of time. If you run tests once an hour, that's over ten minutes lost from every 8-hour work day. But running the test suite once an hour isn't nearly enough. If you run it every 10 minutes, you're losing more than an hour of every day just to learning that django's contrib apps do what their tests say they're supposed to do.

You should have a way to exclude apps that shouldn't be tested as part of your project. We wrote django-hotrunner in part to support this functionality. It never runs tests from contrib apps, and will also exclude other apps you specify in your project settings.

Don't hit the database more than you need to

This comes up quite a bit when django apps. Models connect your app to the database. Managers provide ways of interacting with querysets. Views and forms alter data. All these things need to be tested, but every time you hit the database, you incur a cost. In the worst case, you incur a disk read or write, but there are ways to work around this, creating a database on a ramdisk, or using an in-memory sqlite3 database. Even so, there is a cost in setting up and rolling back transactions after each test, and in executing potentially complex queries, so the more this can be avoided, the better.

One problem comes when too much of your model's state depends on the save() method. For instance, given a model that accepts markdown to later be shown to users as HTML, one might wish to pre-render the HTML, and store it on the model in a separate field alongside the markdown. It seems natural to write this field out when the model is saved, like this:

from django.db import models
from markdown import markdown

class CallOut(models.Model)
    title = models.CharField()
    raw_copy = models.TextField()
    html_copy = models.TextField(editable=False)

    def save(self, \*args, \**kwargs):
        self.html_copy = markdown(self.raw_copy)
        super(CallOut, self).save(\*args, \**kwargs)

But now the conversion of the text is wrapped up with persisting the data. You cannot see if the conversion works without saving the model to the database.

It's not that the save method is the wrong place for this. After all, if a user adds a CallOut through the admin, the save method is the only hook to make it happen. We need the markdown to be written every time we save the model. However, we need a layer of abstraction, so that we don't save the model every time we write the markdown. So we create a method to do that and only that.

from django.db import models
from markdown import markdown

class CallOut(models.Model)
    title = models.CharField()
    raw_copy = models.TextField()
    html_copy = models.TextField(editable=False)

    def convert_copy_to_html(self):
        self.html_copy = markdown(self.raw_copy)

    def save(self, \*args, \**kwargs):
        self.convert_copy_to_html()
        super(CallOut, self).save(\*args, \**kwargs)

Often, many bits of functionality will get wrapped into the save method like this, and separating them out will not only make the tests faster, it will often make the meaning of the code clearer.

Again, let's compare running times of the two approaches, using the code above:

class ModelSaveTestCase(TestCase):

    def test_convert_and_save_performance(self):
        callout = CallOut(raw_copy='This is *markdown*')
        callout.save()
        self.assertIn('<p>', callout.html_copy)

    def test_convert_only_performance(self):
        callout = CallOut(raw_copy='This is *markdown*')
        callout.convert_copy_to_html()
        self.assertIn('<p>', callout.html_copy)

Using an in-memory SQLite3 database (no disk access), we already see a profound speed-up.

(venv)user@host:~/proj/venv/project $ ./manage.py test --verbosity=2
Creating test database for alias 'default' (':memory:')...
...
test_convert_and_save_performance (longtests.tests.ModelSaveTestCase) ... (0.063626s) ok
test_convert_only_performance (longtests.tests.ModelSaveTestCase) ... (0.002822s) ok

----------------------------------------------------------------------
Ran 2 tests in 0.068s

OK

.06 seconds might sound fast, but in comparison to .002 seconds, it is terrible. A 30x performance increase isn't quite as much as we saw from calling the view as a function, but it is well worth the effort. Over a large test suite, these kinds of choices can make a massive difference in the usability of the test suite.

When working with querysets, however, it's often difficult to avoid persisting data. In these cases, it can be helpful to look for places where you can put a seam in between your use of the data and the queryset itself. If you could just as easily operate on a list (or set) as a queryset, pass the data in as an argument, and create a list or set of unsaved model objects for the test suite, only using the queryset during integration tests. Mock it out or monkeypatch it when appropriate.

If this isn't possible, and you have to have some saved data in the database to work with, make sure to use bulk operations to create the data, either working with fixtures, or using the new (in Django 1.4) bulk_create manager method, so as to limit yourself to one database hit per table to populate the data. Just remember that bulk_create won't call your save method, so you have to manually specify all your fields.

Isolate slow tests (but make sure they get run)

Sometimes, you'll just have tests that run slowly no matter what you do to them. It can't always be helped. You need to know that the pieces of your web app work well together, or your site depends on some complex algorithm that is slow no matter how you break it down.

Those tests need to be kept away from the bulk of your tests.

Problem #4, above, can always creep up when the test suite takes too long to run, so sometimes the best response is not to fight it, but to work with it. If developers are going to avoid running tests as often as they should because the tests take too long, isolating the tests that take a long time can limit the number of tests that they skip. The impact of skipping those tests can be mitigated by ensuring that the rest of the tests have solid coverage. In some of our projects, we have created an app called longtests which exists solely to hold this code. It includes our selenium tests and our big hairy integration tests.

Now it is still important to run these tests. They just don't need to be run every time the test suite is run. With django-hotrunner, we add this app to the EXCLUDED_TEST_APPS setting in our development environments, but periodically run the tests with TEST_ALL_APPS set to True. We also make sure that this app is not excluded on our Jenkins continuous integration server, so the tests get run every time we commit the code, but in a way that they don't disturb the developer unless they fail.

This is a decent solution, but certainly not perfect. You either have to run the test suite on the longtests app directly, or go into the project settings, and change a variable to run all the tests. The encumbrance of this process encourages committing code that might break the longtests, and waiting for Jenkins report it. Obviously, it is better to keep the builds passing at all times. One feature we are considering adding to django-hotrunner is adding a --all (or -a) option to ./manage.py test that bypasses the EXCLUDED_TEST_APPS setting, so it is easier to get a full run of the test suite when needed.

Where do we go next?

Using these techniques on our latest project we have gotten up to 60 useful tests so far, and our base test suite (not counting selenium tests) clocks in around 2 seconds. One of our older apps is currently running 1000 tests in an hour and a half. New projects are always easier to keep clean than old ones, but using some of these techniques, and by continuing to develop django-hotrunner, we're hoping to bring them closer to parity with one another.

What techniques do you use to keep your tests running smoothly and accurately? What data do you find useful in getting an overview of how your project is running? Are there any features that would make a tool like django-hotrunner a must-have for you on your projects?

I'd love to hear how other djangonauts, pythonists, and programmers in general tame their test suites.

Post-script

As a side-note, we will be sprinting on django-hotrunner in the after-hours at the PyCarolinas conference. If you'd like to help us make it awesome, feel free to join us at the event in Chapel Hill (MetaMetrics is a silver sponsor of the event), join us on IRC at #django-hotrunner or submit feature requests on our issue tracker.

Comments !