Objects should be constructed in one go

Posted on by Matthias Noback

Consider the following rule:

When you create an object, it should be complete, consistent and valid in one go.

It is derived from the more general principle that it should not be possible for an object to exist in an inconsistent state. I think this is a very important rule, one that will gradually lead everyone from the swamps of those dreaded "anemic" domain models. However, the question still remains: what does all of this mean?

Well, for example, we should not be able to construct a Geolocation object with only a latitude:

final class Geolocation
{
    private $latitude;
    private $longitude;

    public function __construct()
    {
    }

    public function setLatitude(float $latitude): void
    {
        $this->latitude = $latitude;
    }

    public function setLongitude(float $longitude): void
    {
        $this->longitude = $longitude;
    }
}

$location = new Geolocation();
// $location is in invalid state!

$location->setLatitude(-20.0);
// $location is still in invalid state!

It shouldn't be possible to leave it in this state. It shouldn't even be possible to construct it with no data in the first place, because having a specific value for latitude and longitude is one of the core aspects of a geolocation. These values belong together, and a geolocation "can't live" without them. Basically, the whole concept of a geolocation would become meaningless if this were possible.

An object usually requires some data to fulfill a meaningful role. But it also poses certain limitations to what kind of data, and which specific subset of all possible values in the universe would be allowed. This is where, as part of the object design phase, you'll start looking for domain invariants. What do we know from the relevant domain that would help us define a meaningful model for the concept of a geolocation? Well, one of these things is that latitude and longitude should be within a certain range of values, i.e. -90 to 90 inclusive and -180 to 180 inclusive, respectively. It would definitely not make sense to allow any other value to be used. It would render all modelled behavior regarding geolocations useless.

Taking all of this into consideration, you may end up with a class that forms a sound model of the geolocation concept:

final class Geolocation
{
    private $latitude;
    private $longitude;

    public function __construct(
        float $latitude,
        float $longitude
    ) {
        Assertion::between($latitude, -90, 90);
        $this->latitude = $latitude;

        Assertion::between($longitude, -180, 180);
        $this->longitude = $longitude
    }
}

$location = new Geolocation(-20.0, 100.0);

This effectively protects geolocation's domain invariants, making it impossible to construct an invalid, incomplete or useless Geolocation object. Whenever you encounter such an object in your application, you can be sure that it's safe to use. No need to use a validator of some sorts to validate it first! This is why that rule about not allowing objects to exist in an inconsistent state is wonderful. My not-to-be-nuanced advice is to apply it everywhere.

An aggregate with child entities

The rule isn't without issues though. For example, I've been struggling to apply it to an aggregate with child entities, in particular, when I was working on modelling a so-called "purchase order". It's used to send to a supplier and ask for some goods (these "goods" are specific quantities of a certain product). The domain expert talks about this as "a header with lines", or "a document with lines". I decided to call the aggregate root "Purchase Order" (a class named PurchaseOrder) and to call the child entities representing the ordered goods "Lines" (in fact, every line is an instance of Line).

An important domain invariant to consider is: "every purchase order has at least one line". After all, it just doesn't make sense for an order to have no lines. When trying to apply this design rule, my first instinct was to provide the list of lines as a constructor argument. A simplified implementation (note that I don't use proper values objects in these examples!) would look like this:

final class PurchaseOrder
{
    private $lines;

    /**
     * @param Line[] $lines
     */
    public function __construct(array $lines)
    {
        Assertion::greaterThan(count($lines), 1,
            'A purchase order should have at least one line');

        $this->lines = $lines;
    }
}

final class Line
{
    private $lineNumber;
    private $productId;
    private $quantity;

    public function __construct(
        int $lineNumber, 
        int $productId, 
        int $quantity
    ) {
        $this->lineNumber = $lineNumber;
        $this->productId = $productId;
        $this->quantity = $quantity;
    }
}

// inside the application service:
$purchaseOrder = new PurchaseOrder(
    [
        new Line(...), 
        new Line(...)
    ]
); 

As you can see, this design makes the construction of the Line child entities a responsibility of the application service which creates the PurchaseOrder aggregate. One of the issues with that is that lines need to have an identity which is relative to the aggregate. So, when constructing these Line entities, the application service should provide it with an ID too:

// inside the application service:
$lines = [];

foreach (... as $lineNumber => ...) {
    $lines[] = new Line($lineNumber);
}

$purchaseOrder = new PurchaseOrder($lines); 

It would be cool if we wouldn't have to determine the "next identity" of a child entity outside of the aggregate. And the aggregate could happily do this work for us anyway. However, that would require a loop inside the constructor of PurchaseOrder, in which we call setLineNumber() on each Line object:

public function __construct(array $lines)
{
    foreach (array_values($lines) as $index => $line) {
        // line numbers will be 1, 2, 3, ...
        $line->setLineNumber($index + 1);
    }

    $this->lines = $lines;
}

That's not a nice solution, because now a Line can exist in an invalid, because incomplete state - without a line number.

So instead, we should let the PurchaseOrder create those Line entities itself. We'd only need to provide the raw data (product ID, quantity) as constructor arguments, e.g.

public function __construct(array $linesData)
{
    foreach (array_values($linesData) as $index => [$productId, $quantity]) {
        $this->lines[] = new Line(
            $index + 1,
            $productId,
            $quantity
        );
    }
}

However, I'm not really happy with $linesData being just some anonymous data structure. We could introduce something like a proper type for that - LineWithoutLineNumber, but that would be even more silly.

Instead, we should use a "DDD trick", that is to leave the creation of the child entity to the PurchaseOrder. We can do this using something that resembles a factory method. The difference being that this method doesn't (have to) return the created object (a Line instance), and that it also makes a state change to the PurchaseOrder. For example:

final class PurchaseOrder
{
    private $lines = [];

    public function __construct()
    {
        // no lines!
    }

    public function addLine(int $productId, int $quantity): void
    {
        $this->lines[] = new Line(
            count($this->lines) + 1,
            $productId,
            $quantity
        );
    }
}

// in the application service:
$purchaseOrder = new PurchaseOrder();
$purchaseOrder->addLine(...);
$purchaseOrder->addLine(...);
$purchaseOrder->addLine(...);

This looks great. But in the process of removing the $lines parameter from the constructor, we've unfortunately also lost our ability to protect our one domain invariant we cared about. Using this design, I can't think of a reasonable way to verify that there is at least one line. I mean, if we do that right after creating the PurchaseOrder it would be too soon. And adding this assertion to addLine() wouldn't make sense at all. The only moment at which we can reasonably verify it, is just before we persist the PurchaseOrder. In fact, we could move the validation to the repository. However, having a validate() method on PurchaseOrder wouldn't look great, and delegating the responsibility to protect domain invariants to the repository doesn't at all feel like a safe option. We'd basically be back at: throw a bunch of data at this object and validate it afterwards...

I'm tempted to go back to the first solution. But then we'd be running around in circles. Remember, if you get stuck, take a step back. We set out trying to accomplish everything in one go so we could protect this one very important domain invariant. We wouldn't even allow a nice and convenient method such as addLine() to exist, just because that allows the PurchaseOrder to exist in an incomplete, hence inconsistent state.

Being in this situation reminds me of a tweet by Alberto Brandolini:

Every modeling tool will have blind spots. Whenever the discussion around a model turns sterile, choose a different tool to challenge it.

Alberto Brandolini

We're discussing things endlessly, and we get stuck because of an object oriented programming design principle we apply without thinking. We should try to look at this PurchaseOrder thing from a different perspective. Because, if we think about it from a business perspective: a purchase order without any lines is actually fine, until it gets sent to the supplier.

A paper metaphor

As a "modelling tool" I find it very useful to imagine what dealing with a "paper purchase order" would look like. It's not at all far-fetched in this case, because even the domain experts speak of "documents". So consider how we would deal with a paper purchase order document. We'd take an empty sheet, notice some dotted lines to fill in the basics (supplier name, address, etc.). And then we see some blank space where we can write lines for every product we want to order. We could leave this paper "incomplete" for a quick bathroom stop. We can take it with us and discuss something about it with a co-worker, after which we make some corrections. Or we can even throw it away and start all over. But at some point, we're going to actually send it over to the supplier. And before we do, we give it one more look to verify that everything is there.

Translating the metaphor back to the code, we might realize that there's really two different "phases" in the life-cycle of the PurchaseOrder, which come with their own invariants. When the order is in its "draft" phase, we need to supply basic information, but we're allowed to add (and maybe remove) lines at will. Once we "finalize" the purchase order, we're claiming that it's ready to be sent, and at that point we could protect some other invariants.

We would only need to add a method to PurchaseOrder that would "finalize" it. Trying to be DDD-compliant, we look for a word that our domain experts use. This word turns out to be "place" - we're gradually filling in all the details and then we place the purchase order. So, in code it could look something like this:

final class PurchaseOrder
{
    private $lines = [];
    private $isPlaced;

    public function __construct(...)
    {
        // ...
    }

    public function addLine(int $productId, int $quantity): void
    {
        if ($this->isPlaced) {
            throw new \LogicException(
                'You cannot add a line to an order that was already placed'
            );
        }

        $this->lines[] = new Line(
            count($this->lines) + 1,
            $productId,
            $quantity
        );
    }

    public function place(): void
    {
        Assertion::greaterThan(count($this->lines), 1,
            'A purchase order should have at least one line');

        $this->isPlaced = true;
    }
}

// in the application service:
$purchaseOrder = new PurchaseOrder();

$purchaseOrder->addLine(...);
$purchaseOrder->addLine(...);
$purchaseOrder->addLine(...);

$purchaseOrder->place();

Note that, besides adding a place() method, we also modified addLine() to prevent new lines from being added after the order was placed. In the paper metaphor this wouldn't be allowed either, since the document has been sent to the supplier, so it will be very confusing if lines get added in our local version of the purchase order.

Also note that the place() method brings the aggregate root in a certain state, after which not everything is possible anymore. This might remind you of the concept of a state machine. I actually find that entities are often much like state machines. Given a certain state, operations on it are limited. And state transitions are limited too. For example, before placing the order, it would be possible to cancel it without any consequences, but after placing it, the system needs to take all kinds of compensating actions (send a message to the supplier that the order has been cancelled, etc.).

Conclusion

I find that leaving the exact type and construction details of nested objects to their parent object leads to a more "supple" design. In a sense, this is "old OOP knowledge" - we hide the implementation details of how exactly the PurchaseOrder deals with the lines (e.g. does it use a plain old array, or a collection object, do we need a Line class at all, etc.). We thereby allow refactoring of the PurchaseOrder aggregate without having to update all its clients across the code base.

This is part of what's meant by the traditional DDD advice to make the aggregate root the only entry point to interaction with any aggregate part:

Choose one ENTITY to be the root of each AGGREGATE, and control all access to the objects inside the boundary through the root.

Eric Evans, "Domain-Driven Design", Part II, Chapter Six: "The Life Cycle of a Domain Object"

No client should be able to make use of or create new Line objects directly. Unless, I have to add, there is a very specific use case for that. For instance, you should not expose all those internals for the sake of unit testing (see also a previous article - "Testing actual behavior").

So even though we ended up with a better design, we had to reconsider options for protecting the "an order should have at least one line" domain invariant. We discussed reaching for other modelling tools, like working out a "paper metaphor". Of course there are other modelling tools, but this one was effective for making us realize there are actually two distinct phases in the life-cycle of the purchase order.

The general advice being: if you find yourself stuck with a modelling question, look for ways to change your perspective. Even look for (unconsciously) applied (programming) rules that keep you from reaching the best solution. Alberto adds another useful suggestion to that:

...and you’re maybe looking for "the best" solution. When all you need is "a solution".

PHP DDD OOP design Comments

About fixtures

Posted on by Matthias Noback

System and integration tests need database fixtures. These fixtures should be representative and diverse enough to "fake" normal usage of the application, so that the tests using them will catch any issues that might occur once you deploy the application to the production environment. There are many different options for dealing with fixtures; let's explore some of them.

Generate fixtures the natural way

The first option, which I assume not many people are choosing, is to start up the application at the beginning of a test, then navigate to specific pages, submit forms, click buttons, etc. until finally the database has been populated with the right data. At that point, the application is in a useful state and you can continue with the act/when and assert/then phases. (See the recent article "Pickled State" by Robert Martin on the topic of tests as specifications of a finite state machine).

Populating the database like this isn't really the same as loading database fixtures, but these activities could have the same end result. The difference is that the natural way of getting data into the database - using the user interface of the application - leads to top quality data:

  • You don't need to violate the application's natural boundaries by talking directly to the database. You approach the system as a black box, and don't need to leverage your knowledge of its internals to get data into the database.
  • You don't have to maintain these fixtures separately from the application. They will be recreated every time you run the tests.
  • This means that these "fixtures" never become outdated, incomplete, invalid, inconsistent, etc. They are always correct, since they use the application's natural entry points for entering the data in the first place.

However, as you know, the really big disadvantage is that running those tests will become very slow. Creating an account, logging in, activating some settings, filling in some more forms, etc. every time before you can verify anything; that's going to take a lot of time. So honestly, though it would be great; this is not a realistic scenario in most cases. Instead, you might consider something else:

Generate once, reload for every test case

Instead of navigating the application and populating the database one form at a time, for every test case again, you could do it once, and store some kind of snapshot of the resulting data set. Then for the next test case you could quickly load that snapshot and continue with your test work from there.

This approach has all the advantages of the first option, but it will make your test suite run a lot faster. The risk is that the resulting set of fixtures may not be diverse enough to test all the branches of the code that needs to be tested.

With both of these options, you may also end up with a chicken/egg problem. You may need some data to be in the database first, to make it even possible to navigate to the first page where you could start building up the fixtures. Often this problem itself may provide useful feedback about the design of your application:

  • Possibly, you have data in the database that shouldn't be there anyway (e.g. a country codes table that might as well have been a text file, or a list of class constants).
  • Possibly, the data can only end up in the database by manual intervention; something a developer or administrator gets asked to do every now and then. In that case, you could consider implementing a "black box alternative" for it (e.g. a page where you can accomplish the same thing, but with a proper form or button).

If these are not problems you can easily fix, you may consider using several options combined: first, load in some "bootstrap" data with custom SQL queries (see below), then navigate your way across the application to bring it in the right state.

But, there are other options, like:

Insert custom data into the database

If you don't want to or can't naturally build up your fixtures (e.g. because there is no straight-forward way to get it right). you can in fact do several alternative things:

  1. Use a fixture tool that lets you use actually instantiated entities as a source for fixtures, or
  2. Manually write INSERT queries (possibly with the same net result).

Option 1 has proven useful if you use your database as some anonymous storage thing that's used somewhere behind a repository. If you work with an ORM, that is probably the case. Option 2 is the right choice if your database is this holy thing in the centre of your system, and:

  • The data in this database is often inconsistent or incomplete, and/or
  • Other applications are also reading from or writing to this database.

Manually writing fixtures in that case allows you to also write "corrupt" fixtures on purpose and verify that your application code is able to deal with that.

There's still one problematic issue, an issue which all of the above solutions have in common: shared data between all or many test cases. One radical approach that in my experience works really well, is to:

Insert custom data for each test case

What happens when you load a lot of data into the database (no matter how you do that), and run all the tests from this starting point?

  • All the tests start relying on some data that was not consciously put there.
  • You can't easily find out which data a test relies on, so you can't easily replace or modify the fixtures, without breaking existing tests.
  • Even adding new data might have a negative influence on the results of other tests.

In my experience, even the tiniest bit of "inheritance" you use in the process of loading fixtures, will always come back to bite you in the tail. Just like when you use class inheritance, when you use fixture inheritance, you may find certain things impossible to achieve. That's why, when it comes to fixtures, you should apply something like the "prefer composition over inheritance" rule. But I often take this one step further: no composition, no inheritance (no fear of duplication): just setup the fixtures specifically for one test (class or suite, possibly even method or scenario).

This has several advantages:

  • The fixture data is unique for the test, so you can be very specific, tailoring the fixtures to your needs.
  • You can even document why part of the data set is even there.
  • The set of fixture data is small, leading to fast load times.
  • You can safely modify fixtures, even remove them, without worrying about some remote test breaking.

There is one disadvantage I can think of: it takes more work to prepare those fixtures. However, the time spent writing fixtures is easily won back by the sheer joy and ease of maintaining them. In fact, I find that "fixture maintenance" is hardly a thing.

Meta: do you need fixtures at all?

As a conclusion, you should consider an important "meta" question too:

Do your objects really need to be reconstituted from the database? What if the repository itself would - when used in a test case - be able to just store and retrieve objects in-memory? This often requires a bit of architectural rework using the Dependency inversion principle. But afterwards, you probably won't need to test every part of your application with a full-featured database anymore.

P.S. Just don't replace MySQL with Sqlite for speed. It's still much better to test actual database interactions against the real thing. Testing it with Sqlite doesn't prove that it's going to work with the real database in production. See also my previous article "Mocking at architectural boundaries - Persistence & Time".

PHP fixtures database testing Comments

Blogging every week

Posted on by Matthias Noback

A very important "trick" in finding the flow in life is: do what you like most. Of course, you have to do things you don't like (and then you need different life hacks), but when you can do something you like, you'll find that you'll be more successful at it.

When it comes to blogging, I find that it helps to follow my instincts, to write about whatever I like to write about at the moment. I can think of a list of things that need blogging about, but I end up not writing about them because they don't light the fire inside me (anymore).

Start writing very soon

So, when I decided in January of this year to publish a blog post every week, this was the rule to be applied: whenever I felt the urge to write an article about something I thought was very interesting at the moment, I'd just do it. It turns out that this sometimes requires me to write on the train, in the plane, at night. But I'd still do it (or at least, I'd do it within a certain amount of time, like a maximum of one day after I had the idea). Otherwise, the idea would fade away, just like its importance.

Starting to write should be very easy

Besides starting to write about some idea early after its conception, another important trick is to make that "starting to write" step as easy as possible. It should be very easy to:

  • Start working on a new blog post
  • Modify an existing one
  • Add code samples
  • Publish it

(One thing that could improve in my own workflow is: it should be very easy to add images...)

For me, static site generator Sculpin, combined with an automated Docker setup, helps a lot with this rule. For an idea about the setup, check out Lucas van Lierop's open source blog.

Imagine your audience, but never let the imaginary audience judge you

When writing, it helps to imagine who will be reading it. I tend to mostly write with my direct co-workers as the audience in mind. I make sure to add links to reference material, in case the reader needs a refresher on a topic, or is just not familiar with it. It's always good to take a meta-perspective while writing, considering the places in the text where you may lose readers.

Although considering the audience while writing is a good idea, you should never give them a voice while writing. You shouldn't allow them to interrupt your good work and point out that

  • somebody else has already written about it (and better),
  • this is not very original,
  • this is only interesting for 2 people,
  • this contains so little information, it doesn't deserve to be called a blog post,
  • and so on...

Leave the judging to the real people who'll eventually read your article. And even then, don't let yourself be disappointment by their comments. By the way, even when lots of people are reading your posts, they usually don't comment at all, in which case: don't worry about that either!

Besides judging, people may also provide you with good feedback. Since it's your blog, you're still allowed to ignore them, but you could of course decide to do something with it too.

Don't publish the first article immediately

In my experience, if you write one article and publish it immediately, you'll have another mountain to climb before you publish the second article. In other words: that second article won't come and you'll feel very bad about it too. So, my trick is to write two or more articles before starting to publish one. To see this queue of articles makes me feel like I'm in control and I don't have to worry about the next deadline anymore; I'll make it anyway, even if I don't write for two weeks in a row. This is also a great way to deal with holidays; I just write a few posts in advance, and publish them even when I'm not working.

Finding good topics

For me, a great source of topics are questions asked during workshop sessions, or conference/meetup talks. These questions are a clear sign that not everybody "just knows". For example, the discussion about "where to generate an ID" comes up often during workshops. In such a case, I provide a brief summary to the participants, but point to the article for more details. A similar source of interesting topics are conversations with colleagues.

Another source is programming work itself. Whenever I feel like I found a good solution for something, or even found a solution "template" or "pattern", I like to write about it to gather feedback. The "ORMless" article is a nice example of that.

Finally, sometimes I feel like blogging when I notice a discussion on Twitter about a topic I have an opinion on/experience with/etc.

Conclusion

With these rules, tips and tricks, it turned out that my goal to publish one blog post every week was quite achievable. It took me about 3 hours every week to keep up with the rhythm. I suspect that actually having such a rhythm is part of why it worked. I also realize that it's still a lot of time to invest, and that 3 hours a week is optimistic if you're starting out as a blogger. Which is why I'm not saying: "you can do it too". But I still believe that these suggestions might come in handy. Please let me know how it went...

PHP writing Comments