About fixtures

Matthias Noback

July 10, 2018

System and integration tests need database fixtures. These fixtures should be representative and diverse enough to “fake” normal usage of the application, so that the tests using them will catch any issues that might occur once you deploy the application to the production environment. There are many different options for dealing with fixtures; let’s explore some of them.

Generate fixtures the natural way

The first option, which I assume not many people are choosing, is to start up the application at the beginning of a test, then navigate to specific pages, submit forms, click buttons, etc. until finally the database has been populated with the right data. At that point, the application is in a useful state and you can continue with the act/when and assert/then phases. (See the recent article “Pickled State” by Robert Martin on the topic of tests as specifications of a finite state machine).

Populating the database like this isn’t really the same as loading database fixtures, but these activities could have the same end result. The difference is that the natural way of getting data into the database - using the user interface of the application - leads to top quality data:

You don’t need to violate the application’s natural boundaries by talking directly to the database. You approach the system as a black box, and don’t need to leverage your knowledge of its internals to get data into the database.
You don’t have to maintain these fixtures separately from the application. They will be recreated every time you run the tests.
This means that these “fixtures” never become outdated, incomplete, invalid, inconsistent, etc. They are always correct, since they use the application’s natural entry points for entering the data in the first place.

However, as you know, the really big disadvantage is that running those tests will become very slow. Creating an account, logging in, activating some settings, filling in some more forms, etc. every time before you can verify anything; that’s going to take a lot of time. So honestly, though it would be great; this is not a realistic scenario in most cases. Instead, you might consider something else:

Generate once, reload for every test case

Instead of navigating the application and populating the database one form at a time, for every test case again, you could do it once, and store some kind of snapshot of the resulting data set. Then for the next test case you could quickly load that snapshot and continue with your test work from there.

This approach has all the advantages of the first option, but it will make your test suite run a lot faster. The risk is that the resulting set of fixtures may not be diverse enough to test all the branches of the code that needs to be tested.

With both of these options, you may also end up with a chicken/egg problem. You may need some data to be in the database first, to make it even possible to navigate to the first page where you could start building up the fixtures. Often this problem itself may provide useful feedback about the design of your application:

Possibly, you have data in the database that shouldn’t be there anyway (e.g. a country codes table that might as well have been a text file, or a list of class constants).
Possibly, the data can only end up in the database by manual intervention; something a developer or administrator gets asked to do every now and then. In that case, you could consider implementing a “black box alternative” for it (e.g. a page where you can accomplish the same thing, but with a proper form or button).

If these are not problems you can easily fix, you may consider using several options combined: first, load in some “bootstrap” data with custom SQL queries (see below), then navigate your way across the application to bring it in the right state.

But, there are other options, like:

Insert custom data into the database

If you don’t want to or can’t naturally build up your fixtures (e.g. because there is no straight-forward way to get it right). you can in fact do several alternative things:

Use a fixture tool that lets you use actually instantiated entities as a source for fixtures, or
Manually write INSERT queries (possibly with the same net result).

Option 1 has proven useful if you use your database as some anonymous storage thing that’s used somewhere behind a repository. If you work with an ORM, that is probably the case. Option 2 is the right choice if your database is this holy thing in the centre of your system, and:

The data in this database is often inconsistent or incomplete, and/or
Other applications are also reading from or writing to this database.

Manually writing fixtures in that case allows you to also write “corrupt” fixtures on purpose and verify that your application code is able to deal with that.

There’s still one problematic issue, an issue which all of the above solutions have in common: shared data between all or many test cases. One radical approach that in my experience works really well, is to:

Insert custom data for each test case

What happens when you load a lot of data into the database (no matter how you do that), and run all the tests from this starting point?

All the tests start relying on some data that was not consciously put there.
You can’t easily find out which data a test relies on, so you can’t easily replace or modify the fixtures, without breaking existing tests.
Even adding new data might have a negative influence on the results of other tests.

In my experience, even the tiniest bit of “inheritance” you use in the process of loading fixtures, will always come back to bite you in the tail. Just like when you use class inheritance, when you use fixture inheritance, you may find certain things impossible to achieve. That’s why, when it comes to fixtures, you should apply something like the “prefer composition over inheritance” rule. But I often take this one step further: no composition, no inheritance (no fear of duplication): just setup the fixtures specifically for one test (class or suite, possibly even method or scenario).

This has several advantages:

The fixture data is unique for the test, so you can be very specific, tailoring the fixtures to your needs.
You can even document why part of the data set is even there.
The set of fixture data is small, leading to fast load times.
You can safely modify fixtures, even remove them, without worrying about some remote test breaking.

There is one disadvantage I can think of: it takes more work to prepare those fixtures. However, the time spent writing fixtures is easily won back by the sheer joy and ease of maintaining them. In fact, I find that “fixture maintenance” is hardly a thing.

Meta: do you need fixtures at all?

As a conclusion, you should consider an important “meta” question too:

Do your objects really need to be reconstituted from the database? What if the repository itself would - when used in a test case - be able to just store and retrieve objects in-memory? This often requires a bit of architectural rework using the Dependency inversion principle. But afterwards, you probably won’t need to test every part of your application with a full-featured database anymore.

P.S. Just don’t replace MySQL with Sqlite for speed. It’s still much better to test actual database interactions against the real thing. Testing it with Sqlite doesn’t prove that it’s going to work with the real database in production. See also my previous article “Mocking at architectural boundaries - Persistence & Time”.