Assertions and assertion libraries

Posted on by Matthias Noback

When you're looking at a function (an actual function or a method), you can usually identify several blocks of code in there. There are pre-conditions, there's the function body, and there may be post-conditions. The pre-conditions are there to verify that the function can safely proceed to do its real job. Post-conditions may be there to verify that you're going to give something back to the caller that will make sense to them.

In a quasi-mathematical way: most pre-conditions verify that the input is within the supported domain of the function. Post-conditions verify that the output is within the range of the function. A mathematical function (as far as I know) only verifies its pre-conditions, because bad input could've been provided for which the function just doesn't work (e.g. 1/x doesn't work for input x = 0). Once the input has been validated, the function will yield an answer which doesn't have to be verified. Every answer will be valid no matter what.

It works the same way for function pre-conditions and post-conditions in code; you'll usually only find pre-conditions in code, no post-conditions. Quite often however you may not even find pre-conditions, but "medio-conditions"; that's when input validation happens everywhere inside the function.

This is not a desirable situation: for the function body to be as clear as possible, we'd want to push all pre-condition checks to the top of the function. Then we'll end up with a function body where "nothing can go wrong".

Sometimes the programming language itself can help with these pre-conditions: for instance, the language may support strict typing, which prevents certain types of invalid input to be provided. Some languages offer more advanced ways of defining pre-conditions, like pattern matching.

PHP doesn't have a lot of options, and before PHP 7 we didn't even have a way to define parameter types using primitives like int, string, etc. So many of us have been doing manual assertions at the top of functions, to verify some basic aspects of the provided arguments, like this:

if (!is_string($name)) {
    throw new InvalidArgumentException('$name should be a string');
}

This leads to lots of code duplication, across projects even, so it's a great opportunity for code reuse. Benjamin Eberlei created a popular library for it, and Bernhard Schussek created a variation on it. Both have become quite commonly used in projects. They offer useful shortcuts like Assert::string($value), Assert::greaterThan($value), which will check the value and throw an InvalidArgumentException if an expectation is not met. You can provide custom exception messages as well:

Assertion::string($name, '$name should be a string');

The funny thing is, PHP already has a built-in assertion tool. It's not as convenient as the assertion functions that these libraries provide. You'd have to write all the checks yourself:

assert(is_string($name), '$name should be a string');

On the other hand, it has one interesting feature that exposes the core idea of assertions: the fact that you can turn them off (e.g. in a production environment), without touching the code. Even though you can't easily turn off an assertion library once you start using it, I still think it's a very interesting test to see if you're using such a library in the correct way: just entertain the thought that you would turn the assertions off, would the system still function correctly?

I think this deserves a bit of an explanation. We should first consider the question why we need assertions in the first place. The answer is that some callers may provide bad data as input arguments to our function, so we need to protect it against this bad data. We throw an exception because things aren't going to work out well if we'd just carry on. The culprit however, isn't the innocent user of our program, it's the caller of the function. So we'd never want an InvalidArgumentException to bubble up to the user.

So the first rule of using assertions is: don't use assertions to validate user input, use it to validate function arguments. This means that, given that the user uses the application in a way that is valid and supported by our user interface (e.g. they are not trying to "hack" our system by tampering with POST request data), they should never receive a useless "500 Internal server error" response because some assertion failed. The other way around: if you find an assertion exception in your logs, assuming that all your users are innocent, you know that something is wrong about your user interface, since it apparently allows the user to accidentally provide the wrong data.

// $page is taken from the request's query parameters
$page = ...;

Assertion::greaterThan(0, $page, 'The page query parameter should be larger than 0');

User input will indeed be a reason for functions to fail. But so are external failures in outgoing calls. If a function reaches out to the database to fetch an entity by its ID, then the entity may not exist (anymore) and the call will fail. Before you make that call to the database, you don't know yet if the function will fail or not. This is why language designers usually make a difference between LogicExceptions and RuntimeExceptions. They all extend from the generic Exception class, but their intent is different. A RuntimeException is a failure caused by external, unpredictable things: the database, the filesystem, the network, etc. A LogicException is a programming mistake. It shouldn't have happened, but somehow, somewhere, a programmer didn't use a function well. Can you guess what the parent class of InvalidArgumentException is? It's LogicException, and rightfully so. Whenever an assertion triggers an exception, you know that you have made a mistake.

This brings us to the second rule of using assertions: don't use assertions to validate return values from other functions.

$id = 123;
$entity = $repository->findById($id);

// Don't use an assertion here
Assertion::isInstanceOf($entity, Entity::class);

// Instead, throw a RuntimeException, or a domain-specific one
if ($entity === null) {
    throw new RuntimeException('Entity with ID ' . $id . ' not found');
}

Another example of making an assertion about a return value:

$dateTime = DateTimeImmutable::createFromFormat('d/m/Y', $dateString);

Assertion::isInstanceOf(DateTimeImmutable::class, $datetime);

The real problem here is that DateTimeImmutable::createFromFormat() has a design issue: it returns either false or a DateTimeImmutable instance. This isn't good form. If it's impossible to construct an object from the provided $dateString argument, this function should throw an exception. Once it does, we don't need to make an assertion about its return value. The solution in code would be introduce a wrapper with a more appropriate API, e.g.

final class DateTime
{
    public static createFromFormat(
        string $format, 
        string $dateString
    ): DateTimeImmutable {
        $dateTime = DateTimeImmutable::createFromFormat($format, $dateString);

        if (!$dateTime instanceof DateTimeImmutable) {
            throw new InvalidArgumentException(
                'The provided date string is in the wrong format' 
            );
        }

        return $dateTime;
    }
}

The above example also demonstrates a more general rule for assertions: don't use assertions as a replacement for exceptions. If you think about it, you can replace every if branch which throws an exception with an assertion. This may seem like a useful trick, because it saves you from writing a unit test for that branch:

/*
 * There's a separate branch in the code that throws this exception, 
 * so theoretically it should be covered with an extra unit test.
 */
if ($entity === null) {
    throw new RuntimeException('Entity with ID ' . $id . ' not found');
}

/*
 * There's no longer a separate branch, so the unit test for the happy
 * path of this function will also cover this line, even though it 
 * won't trigger the exception.
 */
Assertion::isInstanceOf($entity, Entity::class);

There's more to talk about with regard to unit testing, and the big question to me is: should we write unit tests to verify that our assertions work?

Assertions should be used as sanity checks. In that sense, they are more like a trace: evidence that someone called a function with an incompatible piece of data. In that sense, you usually don't need to write specific unit test cases that catch the exceptions produced by these assertions.

Why? Let's get back to the beginning of this post: many things that we use assertions for, could also be verified at the level of the programming language itself. You may know this from experience if you've worked with PHP 5, have added lots of assertions like Assertion::string() and the likes, until PHP 7 came along and you could remove all those assertions. It's just that PHP is still quite limited with respect to what function pre-conditions can be checked by the language.

The same goes for the type system. For instance, if your language supports union types, like something is either This or That, you don't have to write an assertion for that anymore. With pattern matching, things become even more advanced, and you could omit assertions like "there should be at least one element in the list".

Now let's combine this with the idea that it should be possible to switch off assertions and still have a working program (except that it may be harder to debug the weird issues that would be caught by assertions otherwise). Should or shouldn't we write unit tests for assertions? I find that not every assertion is as important, and so not every assertion requires an extra test,

Rules of thumb for me are: If a better type system would be able to fix it, then don't test it. For example:

// Verify that all elements of a list are of a certain type
Assertion::allIsInstanceOf($list, Element::class);

// And all the primitive type assertions for PHP 5 applications
Assertion::string($value);
Assertion::boolean($value);
// etc.

On the other hand, If you're asserting that an input value is within the allowed domain, test it.

For example:

// Verify that the value is within a certain range:
Assertion::greaterThan($value, 0);
Assertion::lessThan($value, 10);
// etc.

// Verify that a string matches a certain pattern:
Assertion::regex($value, '/\d+/');
Assertion::alnum($value);
// etc.

// Verify a number of elements:
Assertion::count($value, 2);

This explains why I find myself testing mostly assertions from the constructors of value objects, since value objects are much like native language types, but they usually limit the domain of the input arguments.

Conclusion

Assertions are sanity checks. When they would be left out, you should still have a correctly function application. They should never become user-facing errors.

Useful rules of thumb for working with assertions and assertion libraries are:

  • Use them to validate function arguments, but only at the beginning of a function.
  • Instead of making assertions about another function's return value, throw an exception inside the that other function.
  • Don't use assertions as replacement for exceptions.
  • If a better type system would fix be able to it, use an assertion, but don't unit test for its exception.
  • If an assertion validates the domain of a value, write a unit test that shows that it works.
PHP design assertions