About fixtures

Posted on by Matthias Noback

System and integration tests need database fixtures. These fixtures should be representative and diverse enough to "fake" normal usage of the application, so that the tests using them will catch any issues that might occur once you deploy the application to the production environment. There are many different options for dealing with fixtures; let's explore some of them.

Generate fixtures the natural way

The first option, which I assume not many people are choosing, is to start up the application at the beginning of a test, then navigate to specific pages, submit forms, click buttons, etc. until finally the database has been populated with the right data. At that point, the application is in a useful state and you can continue with the act/when and assert/then phases. (See the recent article "Pickled State" by Robert Martin on the topic of tests as specifications of a finite state machine).

Populating the database like this isn't really the same as loading database fixtures, but these activities could have the same end result. The difference is that the natural way of getting data into the database - using the user interface of the application - leads to top quality data:

  • You don't need to violate the application's natural boundaries by talking directly to the database. You approach the system as a black box, and don't need to leverage your knowledge of its internals to get data into the database.
  • You don't have to maintain these fixtures separately from the application. They will be recreated every time you run the tests.
  • This means that these "fixtures" never become outdated, incomplete, invalid, inconsistent, etc. They are always correct, since they use the application's natural entry points for entering the data in the first place.

However, as you know, the really big disadvantage is that running those tests will become very slow. Creating an account, logging in, activating some settings, filling in some more forms, etc. every time before you can verify anything; that's going to take a lot of time. So honestly, though it would be great; this is not a realistic scenario in most cases. Instead, you might consider something else:

Generate once, reload for every test case

Instead of navigating the application and populating the database one form at a time, for every test case again, you could do it once, and store some kind of snapshot of the resulting data set. Then for the next test case you could quickly load that snapshot and continue with your test work from there.

This approach has all the advantages of the first option, but it will make your test suite run a lot faster. The risk is that the resulting set of fixtures may not be diverse enough to test all the branches of the code that needs to be tested.

With both of these options, you may also end up with a chicken/egg problem. You may need some data to be in the database first, to make it even possible to navigate to the first page where you could start building up the fixtures. Often this problem itself may provide useful feedback about the design of your application:

  • Possibly, you have data in the database that shouldn't be there anyway (e.g. a country codes table that might as well have been a text file, or a list of class constants).
  • Possibly, the data can only end up in the database by manual intervention; something a developer or administrator gets asked to do every now and then. In that case, you could consider implementing a "black box alternative" for it (e.g. a page where you can accomplish the same thing, but with a proper form or button).

If these are not problems you can easily fix, you may consider using several options combined: first, load in some "bootstrap" data with custom SQL queries (see below), then navigate your way across the application to bring it in the right state.

But, there are other options, like:

Insert custom data into the database

If you don't want to or can't naturally build up your fixtures (e.g. because there is no straight-forward way to get it right). you can in fact do several alternative things:

  1. Use a fixture tool that lets you use actually instantiated entities as a source for fixtures, or
  2. Manually write INSERT queries (possibly with the same net result).

Option 1 has proven useful if you use your database as some anonymous storage thing that's used somewhere behind a repository. If you work with an ORM, that is probably the case. Option 2 is the right choice if your database is this holy thing in the centre of your system, and:

  • The data in this database is often inconsistent or incomplete, and/or
  • Other applications are also reading from or writing to this database.

Manually writing fixtures in that case allows you to also write "corrupt" fixtures on purpose and verify that your application code is able to deal with that.

There's still one problematic issue, an issue which all of the above solutions have in common: shared data between all or many test cases. One radical approach that in my experience works really well, is to:

Insert custom data for each test case

What happens when you load a lot of data into the database (no matter how you do that), and run all the tests from this starting point?

  • All the tests start relying on some data that was not consciously put there.
  • You can't easily find out which data a test relies on, so you can't easily replace or modify the fixtures, without breaking existing tests.
  • Even adding new data might have a negative influence on the results of other tests.

In my experience, even the tiniest bit of "inheritance" you use in the process of loading fixtures, will always come back to bite you in the tail. Just like when you use class inheritance, when you use fixture inheritance, you may find certain things impossible to achieve. That's why, when it comes to fixtures, you should apply something like the "prefer composition over inheritance" rule. But I often take this one step further: no composition, no inheritance (no fear of duplication): just setup the fixtures specifically for one test (class or suite, possibly even method or scenario).

This has several advantages:

  • The fixture data is unique for the test, so you can be very specific, tailoring the fixtures to your needs.
  • You can even document why part of the data set is even there.
  • The set of fixture data is small, leading to fast load times.
  • You can safely modify fixtures, even remove them, without worrying about some remote test breaking.

There is one disadvantage I can think of: it takes more work to prepare those fixtures. However, the time spent writing fixtures is easily won back by the sheer joy and ease of maintaining them. In fact, I find that "fixture maintenance" is hardly a thing.

Meta: do you need fixtures at all?

As a conclusion, you should consider an important "meta" question too:

Do your objects really need to be reconstituted from the database? What if the repository itself would - when used in a test case - be able to just store and retrieve objects in-memory? This often requires a bit of architectural rework using the Dependency inversion principle. But afterwards, you probably won't need to test every part of your application with a full-featured database anymore.

P.S. Just don't replace MySQL with Sqlite for speed. It's still much better to test actual database interactions against the real thing. Testing it with Sqlite doesn't prove that it's going to work with the real database in production. See also my previous article "Mocking at architectural boundaries - Persistence & Time".

PHP fixtures database testing
Comments
This website uses MailComments: you can send your comments to this post by email. Read more about MailComments, including suggestions for writing your comments (in HTML or Markdown).
Youri Thielen
What about generating test data 'the natural way' not trough the UI, as suggested in the article, but through application services / commands? This API is a lot more stable and faster than creating the test data trough the UI. Depending on the architecture of your application and type of test type we can use in memory data or use a real db.
Generating the test data this way is a lot more maintenance friendly since no implementation details are leaked (e.g. database tables, domain objects).
Matthias Noback

Thanks for adding this here, it definitely makes sense to do that.

magnetik

Hey, Thanks for your blog :)

I'm a little bit late to the party. One well crafted fixture for each test is the way to go in our team too.

We are building our fixtures through Doctrine Fixtures, but we are sort of stuck : our model do not have setters, and we need to build some fixtures in some "strange" states for testing purpose. How do you handle this?

What book would you recommend to dive into DDD?

Thanks again

Matthias Noback

Sounds good. You could simply perform some SQL queries (or maybe only the updates if you want) to make sure you don't "pollute" your model classes.

I recommend Vaughn Vernon's : DDD books (green if you don't have much time, red if you want a deep dive).

magnetik

Thanks a lot :)

Roman O Dukuy

I use the last approach. For this ContainerAwareLoader and ORMExecutor help me upload all fixtures before test run and remove data from DB after the tests execute.

adri_p

Great article, thanks!

For improved test performance, I like to open a database transaction when starting a test, setup all fixtures, run the test code and then roll back the transaction. Not committing the transaction improves speed, helps with cleanup of data and even allows parallel test execution if the database supports it.

Matthias Noback

Yes, that's a useful technique as well. Thanks for sharing here.

Bram

For a long time, I've divided tests in three suites (naming might be completely wrong):

  • Unit: your regular unit tests, dependencies of the unit you're testing are mocked
  • Integration: very specific tests that get repository database implementations from the container and tests all their methods (save, findOneBy...)
  • Functional tests: these send a mocked request to an app instance with repository in memory implementations and make assertions on the response.These were accompanied by a bunch of traits that added ensureUser...(), ensureOtherEntity... methods to seed the repositories (wether db or in-memory implementation).To me, this seemed like the best of both worlds: I had a set of slower, but actual integration tests that made sure my database implementations worked and I had tests that made sure that all parts fit together, but still ran super fast because everything was in-memory. Over the last couple of years though, I have run into a couple instances where I had issues that weren't caught by these tests because things were tested in memory and not in the actual DB.
NicolasB

Thanks for this very interesting feed-back on your own experience. Finally did you change you way to write your functional tests as they haven’t catched some bugs (if I well understood your comment)? And a second question: How have you implemented in memory repo ?

Thanks for your reply!

Bram

I have fine-tuned the process and stopped writing these "functional" tests. I also no longer use in-memory implementations. Current process is:

- Unit tests for (almost) every unit. Any dependencies are mocked. All outcomes of the unit are tested: expected return values, thrown exceptions...
- Integration tests. Currently have two types of integration tests: tests against actual database technology (e.g. a repository test that makes sure saving and finding entities works as expected) and "api" tests. These send a mock request to the actual application and make assertions against the response. No dependencies are mocked or are in memory (with the exception of 3d party API endpoints, which are mocked using Guzzle's mockhandler).

To answer your in-memory question: all repositories had an interface and two implementations: the actual database implementation and an in memory implementation (simply storing entities in an array). When running tests, the dependencies were overwritten in the container.

Matthias Noback

Interesting; I'd say functional tests should also use all the real stuff, which makes it possible to notice even more integration issues early on.

If there are issues with the DB, couldn't those be verified by adding a test case to the respective integration test?