Test-driving repository classes - Part 1: Queries

There's something I've only been doing since a year or so, and I've come to like it a lot. My previous testing experiences were mostly at the level of unit tests or functional/system tests. What was left out of the equation was integration tests. The perfect example of which is a test that proves that your custom repository class works well with the type of database that you use for the project, and possibly the ORM or database abstraction library you use to talk with that database. A test for a repository can't be a unit test; that wouldn't make sense. You'd leave a lot of assumptions untested. So, no mocking is allowed.

But how do you test everything that is going on in a repository? Well, I found out a nice way of doing so, one that even allows you to use some kind of test-driven approach. In this article I'll cover one of the two main use cases for repositories: querying the database, and returning some objects for it. The other use case - storing and loading objects - will be discussed in another article.

What's a query?

In essence, a query defines criteria for selecting certain records from the database. Comparing it to the real-world equivalent: you'd have some kind of room full of stuff and you let a robot go in with a list of selection criteria. It examines things one by one to see if it matches those criteria, and if it does, it takes the thing with it, and brings it to you. How can you verify that the robot works well? Sure, you put some things in that room that match the criteria, and see if it fetches those things for you. However, you should also try dropping some things in the room that wouldn't match the criteria you gave to the robot, and verify that the robot doesn't take those things too.

For repository classes it should work the same way. Testing if some query in the repository class works, means that you should load some fixtures for records that would match the criteria. But you shouldn't forget to add some records that wouldn't match the criteria too. Otherwise, you wouldn't notice the difference between your elaborate SELECT query and a SELECT * WHERE 1 query. So basically you'd have to come up with examples, as well as counter-examples. It turns out that these counter-examples can be used to test-drive the query itself.

As an example, consider you need a query that will return a list of products, but only those which are active. You'd start with the simplest situation; a single product that should be returned:

INSERT INTO products (product_id) VALUES (1);

You then write the query for this:

$result = $this->connection
    ->createQuery()
    ->from('products')
    ->select('*')
    ->execute()
    ->fetchAll();

In your test you can then verify that this query indeed loads this one product. At this point I often add another record to the database, to prove that the query is capable of returning more than one object (i.e. there's no "limit" in place or anything).

INSERT INTO products (product_id) VALUES (1);
INSERT INTO products (product_id) VALUES (2);

Implement the first requirement

The query we wrote doesn't yet take the "is active" flag for products into consideration. So now, before you dive in and modify the query, you need to show that the code as it is doesn't yet implement all the requirements. So you add a counter-example; an inactive product:

- Active products
INSERT INTO products (product_id, is_active) VALUES (1, 1);
INSERT INTO products (product_id, is_active) VALUES (2, 1);

- Inactive product - should be ignored
INSERT INTO products (product_id, is_active) VALUES (3, 0);

A failing test

If you run the test again, it will fail, because it returns an extra product that wasn't expected to be there. This is the traditional "red" phase of TDD.

You then modify the query, adding a "where" clause that will exclude inactive products:

$result = $this->connection
    // ...    
    ->andWhere('active = 1')
    // ...
    ->fetchAll();

Run the test again, and the light will be green again.

Green; implement another requirement

Now you can continue working on the next requirement. In this example, we need the product to be in a group of products that's marked as "stock" products. With every extra requirement, we run into another variation we'd need to check. Consider a product that is active, but is not in the right kind of product group; we have to add a counter-example for that:

- Active products (1, 2)
INSERT INTO products (product_id, is_active) VALUES (1, 1);
INSERT INTO products (product_id, is_active) VALUES (2, 1);

- Inactive product (3) - should be ignored
INSERT INTO products (product_id, is_active) VALUES (3, 0);

- Active product, but in a non-stock product group (100) - should be ignored
INSERT INTO product_groups (product_group_id, has_stock_products) VALUES (100, 0);
INSERT INTO products (product_id, is_active, product_group_id) VALUES (4, 1, 100);

Running the tests will make the new product pop up of course, something we don't want, so we need to modify the query:

$result = $this->connection
    // ...    
    ->andWhere('active = 1')
    ->innerJoin(
        'products', 
        'product_groups',
        'product_groups.product_group_id = products.group_id'
    )
    ->andWhere('product_groups.has_stock_products = 1')
    // ...
    ->fetchAll();

This would seem to make it work, but when we re-run the tests now, the result is empty. Products 1 and 2 aren't in a product group yet, so the "inner join" will filter them out. So we have to modify the fixtures to fix this. The same goes for product 3 actually; we should put it in a stock product group, to verify that the "active" flag is still taken into account:

- Active products (1, 2), in a stock-product group (101)
INSERT INTO products (product_id, is_active, product_group_id) VALUES (1, 1, 101);
INSERT INTO products (product_id, is_active, product_group_id) VALUES (2, 1, 101);
INSERT INTO product_groups (product_group_id, stock_products) VALUES (101, 1);

- Inactive product (3), in a stock-product group (101) - should be ignored
INSERT INTO products (product_id, is_active, product_group_id) VALUES (3, 0, 101);

- Active product (4), but in a non-stock product group (100) - should be ignored
INSERT INTO product_groups (product_group_id, has_stock_products) VALUES (100, 0);
INSERT INTO products (product_id, is_active, product_group_id) VALUES (4, 1, 100);

And so on, and so on. For every new requirement, first add a counter-example to the fixtures, then see how it pops up in the query result. Then modify the query to ensure that it doesn't. This is the Red - Green - (Refactor) TDD cycle for queries in repository classes. By the way, I find it helpful to add important IDs or other characteristics as a comment to the fixtures, so it's easy for the reader to figure out what's so special about a certain record.

Helpful?

I find this approach very helpful. It helps you take small steps when working on repositories, where you may often feel insecure about getting the queries right (and understanding your ORM well). Test-driving your queries can prevent you from making stupid mistakes when loading records for specific owners (tenants, users). With this approach you can actually prove that you're loading the right things by adding some records owned by other tenants and users in your fixtures.

Just like with unit tests, it also helps make the code satisfy new requirements, whenever they become relevant. You have a good starting point for making amendments, a clear view on what's there, and a safety net in case you're worried that modifying the query in some way will accidentally result in unwanted records being loaded.

Potential problems

So far I've found that implementing more requirements can become a bit more tedious. You'd have to add more and more counter-examples. In particular if you also want to test how the repository deals with different data and data types, and you want to verify that it hydrates objects correctly.

Still, being able to safely write queries is something that I've wanted for a long time, and now that I can do it, I'm no longer worried as much if I see the correct data on my screen. More than once I've made the mistake of only testing repositories by clicking through the pages in the browser. Seeing anything at all on the screen was sufficient "proof" to me that the query I wrote was correct. Of course, in practice, it often turned out there was an awful mistake hidden in the query, and the data on the screen wasn't the right data at all. With this test-driven approach however, the test has already proven that the repository loads the correct records, nothing more, nothing less.

Vlad Petrov

01/08/2019

Why not like this? $product = new Product(); $em->persist($product); $productFromDb = $repo->find($productName); $this->assertEquals($product, $productFromDb);
When we expose our state(in sql queries), we broke encapsulation.

Reply to this comment by email

Matthias Noback

10/08/2019

Check out https://matthiasnoback.nl/2... for the part about saving and retrieving entities.

Instead of loading fixtures using SQL queries, you can also create new products. The approach described in this article works really well if you don't have entities for everything you can create in your database...

Sebastiaan Stok

17/11/2018

Bonus tip: Run your tests in a transaction so you don't have to set-up and tear down between tests. And it makes parallel testing possible (unless you use locking).

For Symfony the AliceFixtures bundle already provides this.

Nicolas Philippe

26/09/2018

Hello Matthias,
first of all, thanks for your really interesting blog.
I'm wondering if you fill and empty your test database before and after each test ?
how do you handle the fixtures ?

thank you

28/09/2018

We start with an empty database for every test case. We load just a few basic SQL fixtures as "background" fixtures (which are needed to make the application run), plus any number of SQL statements that are needed for just this test case. We usually don't share fixtures between test classes, but we sometimes share them between test cases/methods.

Pablo

27/03/2019

We always load all SQL fixtures and run tests. But, our test cases or test classes do not depend on each other. Do you think it's good approach?

It can be okay, but gradually you may start depending on certain parts of your fixtures in a way that's not explicit, and when someone changes those parts to accommodate for another test, your test may start failing.

Matthias Noback - Blog About software development