Assertions and assertion libraries

Posted on by Matthias Noback

When you're looking at a function (an actual function or a method), you can usually identify several blocks of code in there. There are pre-conditions, there's the function body, and there may be post-conditions. The pre-conditions are there to verify that the function can safely proceed to do its real job. Post-conditions may be there to verify that you're going to give something back to the caller that will make sense to them.

In a quasi-mathematical way: most pre-conditions verify that the input is within the supported domain of the function. Post-conditions verify that the output is within the range of the function. A mathematical function (as far as I know) only verifies its pre-conditions, because bad input could've been provided for which the function just doesn't work (e.g. 1/x doesn't work for input x = 0). Once the input has been validated, the function will yield an answer which doesn't have to be verified. Every answer will be valid no matter what.

It works the same way for function pre-conditions and post-conditions in code; you'll usually only find pre-conditions in code, no post-conditions. Quite often however you may not even find pre-conditions, but "medio-conditions"; that's when input validation happens everywhere inside the function.

This is not a desirable situation: for the function body to be as clear as possible, we'd want to push all pre-condition checks to the top of the function. Then we'll end up with a function body where "nothing can go wrong".

Sometimes the programming language itself can help with these pre-conditions: for instance, the language may support strict typing, which prevents certain types of invalid input to be provided. Some languages offer more advanced ways of defining pre-conditions, like pattern matching.

PHP doesn't have a lot of options, and before PHP 7 we didn't even have a way to define parameter types using primitives like int, string, etc. So many of us have been doing manual assertions at the top of functions, to verify some basic aspects of the provided arguments, like this:

if (!is_string($name)) {
    throw new InvalidArgumentException('$name should be a string');
}

This leads to lots of code duplication, across projects even, so it's a great opportunity for code reuse. Benjamin Eberlei created a popular library for it, and Bernhard Schussek created a variation on it. Both have become quite commonly used in projects. They offer useful shortcuts like Assert::string($value), Assert::greaterThan($value), which will check the value and throw an InvalidArgumentException if an expectation is not met. You can provide custom exception messages as well:

Assertion::string($name, '$name should be a string');

The funny thing is, PHP already has a built-in assertion tool. It's not as convenient as the assertion functions that these libraries provide. You'd have to write all the checks yourself:

assert(is_string($name), '$name should be a string');

On the other hand, it has one interesting feature that exposes the core idea of assertions: the fact that you can turn them off (e.g. in a production environment), without touching the code. Even though you can't easily turn off an assertion library once you start using it, I still think it's a very interesting test to see if you're using such a library in the correct way: just entertain the thought that you would turn the assertions off, would the system still function correctly?

I think this deserves a bit of an explanation. We should first consider the question why we need assertions in the first place. The answer is that some callers may provide bad data as input arguments to our function, so we need to protect it against this bad data. We throw an exception because things aren't going to work out well if we'd just carry on. The culprit however, isn't the innocent user of our program, it's the caller of the function. So we'd never want an InvalidArgumentException to bubble up to the user.

So the first rule of using assertions is: don't use assertions to validate user input, use it to validate function arguments. This means that, given that the user uses the application in a way that is valid and supported by our user interface (e.g. they are not trying to "hack" our system by tampering with POST request data), they should never receive a useless "500 Internal server error" response because some assertion failed. The other way around: if you find an assertion exception in your logs, assuming that all your users are innocent, you know that something is wrong about your user interface, since it apparently allows the user to accidentally provide the wrong data.

// $page is taken from the request's query parameters
$page = ...;

Assertion::greaterThan(0, $page, 'The page query parameter should be larger than 0');

User input will indeed be a reason for functions to fail. But so are external failures in outgoing calls. If a function reaches out to the database to fetch an entity by its ID, then the entity may not exist (anymore) and the call will fail. Before you make that call to the database, you don't know yet if the function will fail or not. This is why language designers usually make a difference between LogicExceptions and RuntimeExceptions. They all extend from the generic Exception class, but their intent is different. A RuntimeException is a failure caused by external, unpredictable things: the database, the filesystem, the network, etc. A LogicException is a programming mistake. It shouldn't have happened, but somehow, somewhere, a programmer didn't use a function well. Can you guess what the parent class of InvalidArgumentException is? It's LogicException, and rightfully so. Whenever an assertion triggers an exception, you know that you have made a mistake.

This brings us to the second rule of using assertions: don't use assertions to validate return values from other functions.

$id = 123;
$entity = $repository->findById($id);

// Don't use an assertion here
Assertion::isInstanceOf($entity, Entity::class);

// Instead, throw a RuntimeException, or a domain-specific one
if ($entity === null) {
    throw new RuntimeException('Entity with ID ' . $id . ' not found');
}

Another example of making an assertion about a return value:

$dateTime = DateTimeImmutable::createFromFormat('d/m/Y', $dateString);

Assertion::isInstanceOf(DateTimeImmutable::class, $datetime);

The real problem here is that DateTimeImmutable::createFromFormat() has a design issue: it returns either false or a DateTimeImmutable instance. This isn't good form. If it's impossible to construct an object from the provided $dateString argument, this function should throw an exception. Once it does, we don't need to make an assertion about its return value. The solution in code would be introduce a wrapper with a more appropriate API, e.g.

final class DateTime
{
    public static createFromFormat(
        string $format, 
        string $dateString
    ): DateTimeImmutable {
        $dateTime = DateTimeImmutable::createFromFormat($format, $dateString);

        if (!$dateTime instanceof DateTimeImmutable) {
            throw new InvalidArgumentException(
                'The provided date string is in the wrong format' 
            );
        }

        return $dateTime;
    }
}

The above example also demonstrates a more general rule for assertions: don't use assertions as a replacement for exceptions. If you think about it, you can replace every if branch which throws an exception with an assertion. This may seem like a useful trick, because it saves you from writing a unit test for that branch:

/*
 * There's a separate branch in the code that throws this exception, 
 * so theoretically it should be covered with an extra unit test.
 */
if ($entity === null) {
    throw new RuntimeException('Entity with ID ' . $id . ' not found');
}

/*
 * There's no longer a separate branch, so the unit test for the happy
 * path of this function will also cover this line, even though it 
 * won't trigger the exception.
 */
Assertion::isInstanceOf($entity, Entity::class);

There's more to talk about with regard to unit testing, and the big question to me is: should we write unit tests to verify that our assertions work?

Assertions should be used as sanity checks. In that sense, they are more like a trace: evidence that someone called a function with an incompatible piece of data. In that sense, you usually don't need to write specific unit test cases that catch the exceptions produced by these assertions.

Why? Let's get back to the beginning of this post: many things that we use assertions for, could also be verified at the level of the programming language itself. You may know this from experience if you've worked with PHP 5, have added lots of assertions like Assertion::string() and the likes, until PHP 7 came along and you could remove all those assertions. It's just that PHP is still quite limited with respect to what function pre-conditions can be checked by the language.

The same goes for the type system. For instance, if your language supports union types, like something is either This or That, you don't have to write an assertion for that anymore. With pattern matching, things become even more advanced, and you could omit assertions like "there should be at least one element in the list".

Now let's combine this with the idea that it should be possible to switch off assertions and still have a working program (except that it may be harder to debug the weird issues that would be caught by assertions otherwise). Should or shouldn't we write unit tests for assertions? I find that not every assertion is as important, and so not every assertion requires an extra test,

Rules of thumb for me are: If a better type system would be able to fix it, then don't test it. For example:

// Verify that all elements of a list are of a certain type
Assertion::allIsInstanceOf($list, Element::class);

// And all the primitive type assertions for PHP 5 applications
Assertion::string($value);
Assertion::boolean($value);
// etc.

On the other hand, If you're asserting that an input value is within the allowed domain, test it.

For example:

// Verify that the value is within a certain range:
Assertion::greaterThan($value, 0);
Assertion::lessThan($value, 10);
// etc.

// Verify that a string matches a certain pattern:
Assertion::regex($value, '/\d+/');
Assertion::alnum($value);
// etc.

// Verify a number of elements:
Assertion::count($value, 2);

This explains why I find myself testing mostly assertions from the constructors of value objects, since value objects are much like native language types, but they usually limit the domain of the input arguments.

Conclusion

Assertions are sanity checks. When they would be left out, you should still have a correctly function application. They should never become user-facing errors.

Useful rules of thumb for working with assertions and assertion libraries are:

  • Use them to validate function arguments, but only at the beginning of a function.
  • Instead of making assertions about another function's return value, throw an exception inside the that other function.
  • Don't use assertions as replacement for exceptions.
  • If a better type system would fix be able to it, use an assertion, but don't unit test for its exception.
  • If an assertion validates the domain of a value, write a unit test that shows that it works.
PHP object design assertions
Comments
This website uses MailComments: you can send your comments to this post by email. Read more about MailComments, including suggestions for writing your comments (in HTML or Markdown).
Alies Lapatsin
Assertions are very special: they can be disabled on production. Because of this functionality, they look like PHPDoc on steroids. So, what is your opinion about assert() on production: should it be enabled or not?

Thanks for the post.

A little hint: There is a typo in "Verify that the value is within a cetain range".

A little question: What do you think about such constructs?

try {
Assertion::minLength($name, 2);
$this->name = $name;
} catch (AssertionFailedException $e) {
throw InvalidCustomerNameException::reason($e->getMessage());
}

final class InvalidCustomerNameException extends \InvalidArgumentException
{
public static function reason(string $msg): InvalidCustomerNameException
{
return new self('Invalid customer name because ' . $msg);
}
}

This would produce a message like "Invalid customer name because Value "%s" is too short, it should have at least %d characters, but only has %d characters."

Matthias Noback

Thanks, I'll fix the typo..I'm not a big fan of what you describe there, since you could basically just write your own if there and throw the exception yourself. I usually use the assertion exception message to add a bit more detail. Also, if you want to use your own exception types and maybe even use them to provide a user-facing error message, you could take a look at https://matthiasnoback.nl/2...

still_dreaming_1

I found this article to be good food for thought that helped me clarify my own. I mostly agree with your general lines of reasoning, but I disagree with a few things that I would like to hear your/other peoples thoughts on.

The point about having an if instead of an assertion in order to throw an exception that would require a unit test seems like strange logic to me. That is like saying everything that needs to be tested should always be inside an if, otherwise it is part of the happy path and does not need to be explicitly tested. Am I missing something?

I also don't like the dismissal of the idea that using an assertion cuts down on the amount of code. I do agree there needs to be a distinction between a RuntimeException and a LogicException, but if an assertion is just a shorter, more convenient, less likely to contain mistakes/bugs way of throwing a LogicException, shouldn't we use something similar to check for conditions that would throw a RuntimeException? This would cut down on duplication and make the code more readable and have less bugs right? Is it just a naming thing? Does the name Assertion imply it is for LogicExceptions only?

I wish most languages allowed specifying many more things in the method signature, like the range of allowed integers. I also wish most languages had a completely different type for an empty string and a non-empty string, since for most cases that a non-empty string is valid, an empty string is not.

Matthias Noback

Thanks for joining the discussion!

The point about cutting down the amount of test code was related to the fact that I've seen people do this :) Of course the reasoning doesn't go the other way around; it's not about testing all the if branches, but all the logical branches. It's just that code coverage usually doesn't make a distinction, and people just want to reach 100% "line coverage". In that case, assertions can be used as a shortcut for increasing code coverage.

In fact, I've seen people use shortcuts for throwing runtime exceptions too; usually in the form of private "guard" methods. Not really a fan of those, personally, but it can be done. Still, you need to add a test case for them, even though the lines themselves have been covered.

Languages could indeed have more built-in ways to tighten the contract of a method. With some newer releases of PHP itself we can expect to let go of certain assertion types as well.