In part 1 of this short series (it's going to end with this article) we covered how you can test-drive the queries in a repository class. Returning query results is only part of the job of a repository though. The other part is to store objects (entities), retrieve them using something like a save() and a getById() method, and possibly delete them. Some people will implement these two jobs in one repository class, some like to use two or even many repositories for this. When you have a separate write and read model (CQRS), the read model repositories will have the querying functionality (e.g. find me all the active products), the write model repositories will have the store/retrieve/delete functionality.

In particular if you write your own mapping code (like I've been doing a lot recently), you need to write some extra tests to verify that the persistence-related activities of your repository function correctly.

Writing tests for store and retrieve actions

When you're testing a class, you're actually specifying its behavior. But you're doing that from an outsider's perspective. The test case uses an instance of your class (the subject-under-test). It calls some methods on it and checks if the resulting behavior is as expected.

How would you specify the behavior of a save() method? You could say about the repository: "It can save an entity". How would you verify that a given repository class implements this specification correctly? If the repository would use Doctrine ORM, you could set up a mock for the EntityManager or a similar class/interface, and verify that it passes the object to its persist() method. However, as explained earlier, within repository classes mock's aren't allowed. The other option would be to make a call to that save() method and afterwards look inside the database to verify that the expected records have been inserted. However, this would tie the test to the implementation of the repository.

It seems there's no easy way in which you can find out if a call to save() has worked. But let's think about that save() method. Why is it there? Because we want to be able to later retrieve the entity that it saves. So, what if we test our save() method by also introducing its counterpart, getById()? That way, we can indirectly find out if save() has worked: getById() is expected to return an object that's equal to the object you've just persisted.

In other words, a black box test for save() can be written if you combine that test with getById():

public function it_can_save_and_retrieve_an_entity(): void
{
    // Create a basic version of the entity and store it
    $originalEntity = ...;
    $this->repository->save($originalEntity);

    // Now load it from the database
    $entityFromDatabase = $this->repository->getById($originalEntity->entityId());

    // Compare it to the entity you created for this test
    self::assertEquals($originalEntity, $entityFromDatabase);
}

State changes, child entities, etc.

Usually an entity doesn't get persisted once. You'll be modifying it, persisting it again, adding child entities to it, removing them again, etc. So, for every situation like this, I like to write another test method, showing that all this really works, e.g.

public function it_can_save_child_entities(): void
{
    // Create a basic version of the entity and store it
    $originalEntity = ...;
    // Add some child entity
    $originalEntity->addChildEntity(...);
    $this->repository->save($originalEntity);

    // Now load it from the database
    $entityFromDatabase = $this->repository->getById($originalEntity->entityId());

    // Compare it to the entity as we've set it up for this test
    self::assertEquals($originalEntity, $entityFromDatabase);
}

Sometimes it makes sense to add intermediate save() and getById() calls, e.g.

public function it_can_save_child_entities(): void
{
    // Create a basic version of the entity and store it
    $originalEntity = ...;
    $this->repository->save($originalEntity);
    // Load and save again, now with an added child entity
    $originalEntity = $this->repository->getById($originalEntity->entityId());
    $originalEntity->addChildEntity(...);
    $this->repository->save($originalEntity);

    // Now load it from the database
    $entityFromDatabase = $this->repository->getById($originalEntity->entityId());

    // Compare it to the entity as we've set it up for this test
    self::assertEquals($originalEntity, $entityFromDatabase);
}

Deleting entities

Finally, a repository may offer a delete() method. This one needs testing too. Deleting is always scary, in particular if you somehow forget to add proper WHERE clauses to your DELETE statements (who didn't, at least once?).

So we should verify that everything related to a single entity has been deleted, but nothing else. How can you do this? Well, if you want black box testing again, you could save two entities, delete one, and check that the other one still exists:

public function it_can_delete_an_entity(): void
{
    // Create the entity
    $originalEntity = ...;
    $originalEntity->addChildEntity(...);
    $this->repository->save($originalEntity);

    // Create another entity
    $anotherEntity = ...;
    $anotherEntity->addChildEntity(...);
    $this->repository->save($anotherEntity);

    // Now delete that other entity
    $this->repository->delete($anotherEntity);

    // Verify that the first entity still exists
    $entityFromDatabase = $this->repository->getById($originalEntity->entityId());
    self::assertEquals($originalEntity, $entityFromDatabase);
}

Or, if you like, you could let go of the black box aspect and populate the database with some entity and child entity records that you want to prove will still exist after you delete a single entity.

Ports & adapters

If you write purely black box tests for your write model entity/aggregate repository (that is, for save(), getById() and delete()), the test cases themselves won't mention anything about the underlying storage mechanism of the repository. You won't find any SQL queries in your test. This means that you could rewrite the repository to use a completely different storage mechanism, and your test wouldn't need to be modified.

This amounts to the same thing as applying a Ports and Adapters architectural style, where you separate the port (i.e. "Persistence") from its specific adapter (i.e. a repository implementation that uses SQL). This is very useful, because it allows you to write fast acceptance tests for your application against code in the Application layer, and use stand-ins or "fakes" for your repositories. It also helps you decouple domain logic from infrastructure concerns, allowing you to replace that infrastructure code and migrate to other libraries or frameworks whenever you want to.

Conclusion

By writing all these repository tests you specify what the repository should be capable of, from the perspective of its users. Specifying and verifying different uses case proves that the repository is indeed capable of storing, retrieving and deleting its entities correctly. Most effective for this purpose is black box testing, where you make sure the repository test is completely decoupled from the repository's underlying storage mechanism. If you can accomplish this, you can rewrite your repository using a different storage mechanism, and prove that everything still works afterwards.

PHP design testing database Comments

Test-driving repository classes - Part 1: Queries

Posted on by Matthias Noback

There's something I've only been doing since a year or so, and I've come to like it a lot. My previous testing experiences were mostly at the level of unit tests or functional/system tests. What was left out of the equation was integration tests. The perfect example of which is a test that proves that your custom repository class works well with the type of database that you use for the project, and possibly the ORM or database abstraction library you use to talk with that database. A test for a repository can't be a unit test; that wouldn't make sense. You'd leave a lot of assumptions untested. So, no mocking is allowed.

But how do you test everything that is going on in a repository? Well, I found out a nice way of doing so, one that even allows you to use some kind of test-driven approach. In this article I'll cover one of the two main use cases for repositories: querying the database, and returning some objects for it. The other use case - storing and loading objects - will be discussed in another article.

What's a query?

In essence, a query defines criteria for selecting certain records from the database. Comparing it to the real-world equivalent: you'd have some kind of room full of stuff and you let a robot go in with a list of selection criteria. It examines things one by one to see if it matches those criteria, and if it does, it takes the thing with it, and brings it to you. How can you verify that the robot works well? Sure, you put some things in that room that match the criteria, and see if it fetches those things for you. However, you should also try dropping some things in the room that wouldn't match the criteria you gave to the robot, and verify that the robot doesn't take those things too.

For repository classes it should work the same way. Testing if some query in the repository class works, means that you should load some fixtures for records that would match the criteria. But you shouldn't forget to add some records that wouldn't match the criteria too. Otherwise, you wouldn't notice the difference between your elaborate SELECT query and a SELECT * WHERE 1 query. So basically you'd have to come up with examples, as well as counter-examples. It turns out that these counter-examples can be used to test-drive the query itself.

As an example, consider you need a query that will return a list of products, but only those which are active. You'd start with the simplest situation; a single product that should be returned:

INSERT INTO products (product_id) VALUES (1);

You then write the query for this:

$result = $this->connection
    ->createQuery()
    ->from('products')
    ->select('*')
    ->execute()
    ->fetchAll();

In your test you can then verify that this query indeed loads this one product. At this point I often add another record to the database, to prove that the query is capable of returning more than one object (i.e. there's no "limit" in place or anything).

INSERT INTO products (product_id) VALUES (1);
INSERT INTO products (product_id) VALUES (2);

Implement the first requirement

The query we wrote doesn't yet take the "is active" flag for products into consideration. So now, before you dive in and modify the query, you need to show that the code as it is doesn't yet implement all the requirements. So you add a counter-example; an inactive product:

- Active products
INSERT INTO products (product_id, is_active) VALUES (1, 1);
INSERT INTO products (product_id, is_active) VALUES (2, 1);

- Inactive product - should be ignored
INSERT INTO products (product_id, is_active) VALUES (3, 0);

A failing test

If you run the test again, it will fail, because it returns an extra product that wasn't expected to be there. This is the traditional "red" phase of TDD.

You then modify the query, adding a "where" clause that will exclude inactive products:

$result = $this->connection
    // ...    
    ->andWhere('active = 1')
    // ...
    ->fetchAll();

Run the test again, and the light will be green again.

Green; implement another requirement

Now you can continue working on the next requirement. In this example, we need the product to be in a group of products that's marked as "stock" products. With every extra requirement, we run into another variation we'd need to check. Consider a product that is active, but is not in the right kind of product group; we have to add a counter-example for that:

- Active products (1, 2)
INSERT INTO products (product_id, is_active) VALUES (1, 1);
INSERT INTO products (product_id, is_active) VALUES (2, 1);

- Inactive product (3) - should be ignored
INSERT INTO products (product_id, is_active) VALUES (3, 0);

- Active product, but in a non-stock product group (100) - should be ignored
INSERT INTO product_groups (product_group_id, has_stock_products) VALUES (100, 0);
INSERT INTO products (product_id, is_active, product_group_id) VALUES (4, 1, 100);

Running the tests will make the new product pop up of course, something we don't want, so we need to modify the query:

$result = $this->connection
    // ...    
    ->andWhere('active = 1')
    ->innerJoin(
        'products', 
        'product_groups',
        'product_groups.product_group_id = products.group_id'
    )
    ->andWhere('product_groups.has_stock_products = 1')
    // ...
    ->fetchAll();

This would seem to make it work, but when we re-run the tests now, the result is empty. Products 1 and 2 aren't in a product group yet, so the "inner join" will filter them out. So we have to modify the fixtures to fix this. The same goes for product 3 actually; we should put it in a stock product group, to verify that the "active" flag is still taken into account:

- Active products (1, 2), in a stock-product group (101)
INSERT INTO products (product_id, is_active, product_group_id) VALUES (1, 1, 101);
INSERT INTO products (product_id, is_active, product_group_id) VALUES (2, 1, 101);
INSERT INTO product_groups (product_group_id, stock_products) VALUES (101, 1);

- Inactive product (3), in a stock-product group (101) - should be ignored
INSERT INTO products (product_id, is_active, product_group_id) VALUES (3, 0, 101);

- Active product (4), but in a non-stock product group (100) - should be ignored
INSERT INTO product_groups (product_group_id, has_stock_products) VALUES (100, 0);
INSERT INTO products (product_id, is_active, product_group_id) VALUES (4, 1, 100);

And so on, and so on. For every new requirement, first add a counter-example to the fixtures, then see how it pops up in the query result. Then modify the query to ensure that it doesn't. This is the Red - Green - (Refactor) TDD cycle for queries in repository classes. By the way, I find it helpful to add important IDs or other characteristics as a comment to the fixtures, so it's easy for the reader to figure out what's so special about a certain record.

Helpful?

I find this approach very helpful. It helps you take small steps when working on repositories, where you may often feel insecure about getting the queries right (and understanding your ORM well). Test-driving your queries can prevent you from making stupid mistakes when loading records for specific owners (tenants, users). With this approach you can actually prove that you're loading the right things by adding some records owned by other tenants and users in your fixtures.

Just like with unit tests, it also helps make the code satisfy new requirements, whenever they become relevant. You have a good starting point for making amendments, a clear view on what's there, and a safety net in case you're worried that modifying the query in some way will accidentally result in unwanted records being loaded.

Potential problems

So far I've found that implementing more requirements can become a bit more tedious. You'd have to add more and more counter-examples. In particular if you also want to test how the repository deals with different data and data types, and you want to verify that it hydrates objects correctly.

Still, being able to safely write queries is something that I've wanted for a long time, and now that I can do it, I'm no longer worried as much if I see the correct data on my screen. More than once I've made the mistake of only testing repositories by clicking through the pages in the browser. Seeing anything at all on the screen was sufficient "proof" to me that the query I wrote was correct. Of course, in practice, it often turned out there was an awful mistake hidden in the query, and the data on the screen wasn't the right data at all. With this test-driven approach however, the test has already proven that the repository loads the correct records, nothing more, nothing less.

PHP design fixtures testing database Comments

Assertions and assertion libraries

Posted on by Matthias Noback

When you're looking at a function (an actual function or a method), you can usually identify several blocks of code in there. There are pre-conditions, there's the function body, and there may be post-conditions. The pre-conditions are there to verify that the function can safely proceed to do its real job. Post-conditions may be there to verify that you're going to give something back to the caller that will make sense to them.

In a quasi-mathematical way: most pre-conditions verify that the input is within the supported domain of the function. Post-conditions verify that the output is within the range of the function. A mathematical function (as far as I know) only verifies its pre-conditions, because bad input could've been provided for which the function just doesn't work (e.g. 1/x doesn't work for input x = 0). Once the input has been validated, the function will yield an answer which doesn't have to be verified. Every answer will be valid no matter what.

It works the same way for function pre-conditions and post-conditions in code; you'll usually only find pre-conditions in code, no post-conditions. Quite often however you may not even find pre-conditions, but "medio-conditions"; that's when input validation happens everywhere inside the function.

This is not a desirable situation: for the function body to be as clear as possible, we'd want to push all pre-condition checks to the top of the function. Then we'll end up with a function body where "nothing can go wrong".

Sometimes the programming language itself can help with these pre-conditions: for instance, the language may support strict typing, which prevents certain types of invalid input to be provided. Some languages offer more advanced ways of defining pre-conditions, like pattern matching.

PHP doesn't have a lot of options, and before PHP 7 we didn't even have a way to define parameter types using primitives like int, string, etc. So many of us have been doing manual assertions at the top of functions, to verify some basic aspects of the provided arguments, like this:

if (!is_string($name)) {
    throw new InvalidArgumentException('$name should be a string');
}

This leads to lots of code duplication, across projects even, so it's a great opportunity for code reuse. Benjamin Eberlei created a popular library for it, and Bernhard Schussek created a variation on it. Both have become quite commonly used in projects. They offer useful shortcuts like Assert::string($value), Assert::greaterThan($value), which will check the value and throw an InvalidArgumentException if an expectation is not met. You can provide custom exception messages as well:

Assertion::string($name, '$name should be a string');

The funny thing is, PHP already has a built-in assertion tool. It's not as convenient as the assertion functions that these libraries provide. You'd have to write all the checks yourself:

assert(is_string($name), '$name should be a string');

On the other hand, it has one interesting feature that exposes the core idea of assertions: the fact that you can turn them off (e.g. in a production environment), without touching the code. Even though you can't easily turn off an assertion library once you start using it, I still think it's a very interesting test to see if you're using such a library in the correct way: just entertain the thought that you would turn the assertions off, would the system still function correctly?

I think this deserves a bit of an explanation. We should first consider the question why we need assertions in the first place. The answer is that some callers may provide bad data as input arguments to our function, so we need to protect it against this bad data. We throw an exception because things aren't going to work out well if we'd just carry on. The culprit however, isn't the innocent user of our program, it's the caller of the function. So we'd never want an InvalidArgumentException to bubble up to the user.

So the first rule of using assertions is: don't use assertions to validate user input, use it to validate function arguments. This means that, given that the user uses the application in a way that is valid and supported by our user interface (e.g. they are not trying to "hack" our system by tampering with POST request data), they should never receive a useless "500 Internal server error" response because some assertion failed. The other way around: if you find an assertion exception in your logs, assuming that all your users are innocent, you know that something is wrong about your user interface, since it apparently allows the user to accidentally provide the wrong data.

// $page is taken from the request's query parameters
$page = ...;

Assertion::greaterThan(0, $page, 'The page query parameter should be larger than 0');

User input will indeed be a reason for functions to fail. But so are external failures in outgoing calls. If a function reaches out to the database to fetch an entity by its ID, then the entity may not exist (anymore) and the call will fail. Before you make that call to the database, you don't know yet if the function will fail or not. This is why language designers usually make a difference between LogicExceptions and RuntimeExceptions. They all extend from the generic Exception class, but their intent is different. A RuntimeException is a failure caused by external, unpredictable things: the database, the filesystem, the network, etc. A LogicException is a programming mistake. It shouldn't have happened, but somehow, somewhere, a programmer didn't use a function well. Can you guess what the parent class of InvalidArgumentException is? It's LogicException, and rightfully so. Whenever an assertion triggers an exception, you know that you have made a mistake.

This brings us to the second rule of using assertions: don't use assertions to validate return values from other functions.

$id = 123;
$entity = $repository->findById($id);

// Don't use an assertion here
Assertion::isInstanceOf($entity, Entity::class);

// Instead, throw a RuntimeException, or a domain-specific one
if ($entity === null) {
    throw new RuntimeException('Entity with ID ' . $id . ' not found');
}

Another example of making an assertion about a return value:

$dateTime = DateTimeImmutable::createFromFormat('d/m/Y', $dateString);

Assertion::isInstanceOf(DateTimeImmutable::class, $datetime);

The real problem here is that DateTimeImmutable::createFromFormat() has a design issue: it returns either false or a DateTimeImmutable instance. This isn't good form. If it's impossible to construct an object from the provided $dateString argument, this function should throw an exception. Once it does, we don't need to make an assertion about its return value. The solution in code would be introduce a wrapper with a more appropriate API, e.g.

final class DateTime
{
    public static createFromFormat(
        string $format, 
        string $dateString
    ): DateTimeImmutable {
        $dateTime = DateTimeImmutable::createFromFormat($format, $dateString);

        if (!$dateTime instanceof DateTimeImmutable) {
            throw new InvalidArgumentException(
                'The provided date string is in the wrong format' 
            );
        }

        return $dateTime;
    }
}

The above example also demonstrates a more general rule for assertions: don't use assertions as a replacement for exceptions. If you think about it, you can replace every if branch which throws an exception with an assertion. This may seem like a useful trick, because it saves you from writing a unit test for that branch:

/*
 * There's a separate branch in the code that throws this exception, 
 * so theoretically it should be covered with an extra unit test.
 */
if ($entity === null) {
    throw new RuntimeException('Entity with ID ' . $id . ' not found');
}

/*
 * There's no longer a separate branch, so the unit test for the happy
 * path of this function will also cover this line, even though it 
 * won't trigger the exception.
 */
Assertion::isInstanceOf($entity, Entity::class);

There's more to talk about with regard to unit testing, and the big question to me is: should we write unit tests to verify that our assertions work?

Assertions should be used as sanity checks. In that sense, they are more like a trace: evidence that someone called a function with an incompatible piece of data. In that sense, you usually don't need to write specific unit test cases that catch the exceptions produced by these assertions.

Why? Let's get back to the beginning of this post: many things that we use assertions for, could also be verified at the level of the programming language itself. You may know this from experience if you've worked with PHP 5, have added lots of assertions like Assertion::string() and the likes, until PHP 7 came along and you could remove all those assertions. It's just that PHP is still quite limited with respect to what function pre-conditions can be checked by the language.

The same goes for the type system. For instance, if your language supports union types, like something is either This or That, you don't have to write an assertion for that anymore. With pattern matching, things become even more advanced, and you could omit assertions like "there should be at least one element in the list".

Now let's combine this with the idea that it should be possible to switch off assertions and still have a working program (except that it may be harder to debug the weird issues that would be caught by assertions otherwise). Should or shouldn't we write unit tests for assertions? I find that not every assertion is as important, and so not every assertion requires an extra test,

Rules of thumb for me are: If a better type system would be able to fix it, then don't test it. For example:

// Verify that all elements of a list are of a certain type
Assertion::allIsInstanceOf($list, Element::class);

// And all the primitive type assertions for PHP 5 applications
Assertion::string($value);
Assertion::boolean($value);
// etc.

On the other hand, If you're asserting that an input value is within the allowed domain, test it.

For example:

// Verify that the value is within a certain range:
Assertion::greaterThan($value, 0);
Assertion::lessThan($value, 10);
// etc.

// Verify that a string matches a certain pattern:
Assertion::regex($value, '/\d+/');
Assertion::alnum($value);
// etc.

// Verify a number of elements:
Assertion::count($value, 2);

This explains why I find myself testing mostly assertions from the constructors of value objects, since value objects are much like native language types, but they usually limit the domain of the input arguments.

Conclusion

Assertions are sanity checks. When they would be left out, you should still have a correctly function application. They should never become user-facing errors.

Useful rules of thumb for working with assertions and assertion libraries are:

  • Use them to validate function arguments, but only at the beginning of a function.
  • Instead of making assertions about another function's return value, throw an exception inside the that other function.
  • Don't use assertions as replacement for exceptions.
  • If a better type system would fix be able to it, use an assertion, but don't unit test for its exception.
  • If an assertion validates the domain of a value, write a unit test that shows that it works.
PHP design assertions Comments