Principles of Package Design, 2nd edition

Posted on by Matthias Noback

All of a sudden it became book-writing season. It began in August when I started revising my second book, "Principles of Package Design". Apress had contacted me about adopting it, and they didn't want to change a lot about it. However, the book was from 2015 and although I had aimed for it to be "timeless", some parts needed an update. Furthermore, I had happily pressed the "Release" button back then, but it's the same as with software development: the code you wrote last year, you wouldn't approve of today.

Book cover

Upgrades

Because Apress has their own pipeline for manuscript to book conversion, I had to take the original Leanpub-flavored Markdown manuscript, export it to HTML, then copy it into Mac Pages, and finally export it as a Word document. That was already a lot of work. Then I started reading the book and collected all the issues, creating post-its along the way. Having every little issue on display was a nice trick. It made progress visible, and made it feel like a project I could finish.

Re-reading my own book was a very interesting experience. I noticed how often I'd been lazy and skipped a proper argument for a piece of advice. I also noticed how some advice wasn't very clear and could easily be misinterpreted.

In that regard, it was very, very helpful to have Ross Tuck on board as a technical reviewer. He pointed out several issues where the reader, given this or that background, could have difficulty understanding a section, or take unintended advice from it. Ross put in a lot of time and effort, so from this place, thanks again!

Besides revising, I've also added several new sections, most notably about the following topics:

  • The reuse of code from the Domain layer, with a discussion about Domain-Driven Design.
  • Why "final" classes should be preferred, and how composing objects should be the preferred way of changing the behavior of existing objects.
  • When to introduce an interface for a (packaged) class.

Because there are many people who have read the first edition, who I don't want to "force" to buy the second edition as well, I've already published several articles that cover more or less the same ground:

Without further ado

So, here we are. The release of the second edition of Principles of Package Design! The book is available on Apress,com and Amazon, but it's also a "regular" book, so your book store should be able to order it as well.

Buy the e-book or soft cover via Apress

Buy the e-book or soft cover via Amazon

Pile of books

Do you want to review the book?

I'm very curious about your opinion. If you've read the book (first or second edition), please let me know what you think of it. It would be great if you could submit a customer review on Amazon.

If you'd be interested in writing a review on your website, blog, etc., send me an email at info@matthiasnoback.nl, so I can send you a review copy.

Also, if you see this book in a store somewhere, it'd be very cool if you could send me a picture!

PHP Book release Principles of Package Design Comments

In part 1 of this short series (it's going to end with this article) we covered how you can test-drive the queries in a repository class. Returning query results is only part of the job of a repository though. The other part is to store objects (entities), retrieve them using something like a save() and a getById() method, and possibly delete them. Some people will implement these two jobs in one repository class, some like to use two or even many repositories for this. When you have a separate write and read model (CQRS), the read model repositories will have the querying functionality (e.g. find me all the active products), the write model repositories will have the store/retrieve/delete functionality.

In particular if you write your own mapping code (like I've been doing a lot recently), you need to write some extra tests to verify that the persistence-related activities of your repository function correctly.

Writing tests for store and retrieve actions

When you're testing a class, you're actually specifying its behavior. But you're doing that from an outsider's perspective. The test case uses an instance of your class (the subject-under-test). It calls some methods on it and checks if the resulting behavior is as expected.

How would you specify the behavior of a save() method? You could say about the repository: "It can save an entity". How would you verify that a given repository class implements this specification correctly? If the repository would use Doctrine ORM, you could set up a mock for the EntityManager or a similar class/interface, and verify that it passes the object to its persist() method. However, as explained earlier, within repository classes mock's aren't allowed. The other option would be to make a call to that save() method and afterwards look inside the database to verify that the expected records have been inserted. However, this would tie the test to the implementation of the repository.

It seems there's no easy way in which you can find out if a call to save() has worked. But let's think about that save() method. Why is it there? Because we want to be able to later retrieve the entity that it saves. So, what if we test our save() method by also introducing its counterpart, getById()? That way, we can indirectly find out if save() has worked: getById() is expected to return an object that's equal to the object you've just persisted.

In other words, a black box test for save() can be written if you combine that test with getById():

public function it_can_save_and_retrieve_an_entity(): void
{
    // Create a basic version of the entity and store it
    $originalEntity = ...;
    $this->repository->save($originalEntity);

    // Now load it from the database
    $entityFromDatabase = $this->repository->getById($originalEntity->entityId());

    // Compare it to the entity you created for this test
    self::assertEquals($originalEntity, $entityFromDatabase);
}

State changes, child entities, etc.

Usually an entity doesn't get persisted once. You'll be modifying it, persisting it again, adding child entities to it, removing them again, etc. So, for every situation like this, I like to write another test method, showing that all this really works, e.g.

public function it_can_save_child_entities(): void
{
    // Create a basic version of the entity and store it
    $originalEntity = ...;
    // Add some child entity
    $originalEntity->addChildEntity(...);
    $this->repository->save($originalEntity);

    // Now load it from the database
    $entityFromDatabase = $this->repository->getById($originalEntity->entityId());

    // Compare it to the entity as we've set it up for this test
    self::assertEquals($originalEntity, $entityFromDatabase);
}

Sometimes it makes sense to add intermediate save() and getById() calls, e.g.

public function it_can_save_child_entities(): void
{
    // Create a basic version of the entity and store it
    $originalEntity = ...;
    $this->repository->save($originalEntity);
    // Load and save again, now with an added child entity
    $originalEntity = $this->repository->getById($originalEntity->entityId());
    $originalEntity->addChildEntity(...);
    $this->repository->save($originalEntity);

    // Now load it from the database
    $entityFromDatabase = $this->repository->getById($originalEntity->entityId());

    // Compare it to the entity as we've set it up for this test
    self::assertEquals($originalEntity, $entityFromDatabase);
}

Deleting entities

Finally, a repository may offer a delete() method. This one needs testing too. Deleting is always scary, in particular if you somehow forget to add proper WHERE clauses to your DELETE statements (who didn't, at least once?).

So we should verify that everything related to a single entity has been deleted, but nothing else. How can you do this? Well, if you want black box testing again, you could save two entities, delete one, and check that the other one still exists:

public function it_can_delete_an_entity(): void
{
    // Create the entity
    $originalEntity = ...;
    $originalEntity->addChildEntity(...);
    $this->repository->save($originalEntity);

    // Create another entity
    $anotherEntity = ...;
    $anotherEntity->addChildEntity(...);
    $this->repository->save($anotherEntity);

    // Now delete that other entity
    $this->repository->delete($anotherEntity);

    // Verify that the first entity still exists
    $entityFromDatabase = $this->repository->getById($originalEntity->entityId());
    self::assertEquals($originalEntity, $entityFromDatabase);

    // Verify that the second entity we just removed, can't be found
    $this->expectException(EntityNotFound::class);
    $this->repository->getById($anotherEntity->entityId());
}

Or, if you like, you could let go of the black box aspect and populate the database with some entity and child entity records that you want to prove will still exist after you delete a single entity.

Ports & adapters

If you write purely black box tests for your write model entity/aggregate repository (that is, for save(), getById() and delete()), the test cases themselves won't mention anything about the underlying storage mechanism of the repository. You won't find any SQL queries in your test. This means that you could rewrite the repository to use a completely different storage mechanism, and your test wouldn't need to be modified.

This amounts to the same thing as applying a Ports and Adapters architectural style, where you separate the port (i.e. "Persistence") from its specific adapter (i.e. a repository implementation that uses SQL). This is very useful, because it allows you to write fast acceptance tests for your application against code in the Application layer, and use stand-ins or "fakes" for your repositories. It also helps you decouple domain logic from infrastructure concerns, allowing you to replace that infrastructure code and migrate to other libraries or frameworks whenever you want to.

Conclusion

By writing all these repository tests you specify what the repository should be capable of, from the perspective of its users. Specifying and verifying different uses case proves that the repository is indeed capable of storing, retrieving and deleting its entities correctly. Most effective for this purpose is black box testing, where you make sure the repository test is completely decoupled from the repository's underlying storage mechanism. If you can accomplish this, you can rewrite your repository using a different storage mechanism, and prove that everything still works afterwards.

PHP design testing database Comments

Test-driving repository classes - Part 1: Queries

Posted on by Matthias Noback

There's something I've only been doing since a year or so, and I've come to like it a lot. My previous testing experiences were mostly at the level of unit tests or functional/system tests. What was left out of the equation was integration tests. The perfect example of which is a test that proves that your custom repository class works well with the type of database that you use for the project, and possibly the ORM or database abstraction library you use to talk with that database. A test for a repository can't be a unit test; that wouldn't make sense. You'd leave a lot of assumptions untested. So, no mocking is allowed.

But how do you test everything that is going on in a repository? Well, I found out a nice way of doing so, one that even allows you to use some kind of test-driven approach. In this article I'll cover one of the two main use cases for repositories: querying the database, and returning some objects for it. The other use case - storing and loading objects - will be discussed in another article.

What's a query?

In essence, a query defines criteria for selecting certain records from the database. Comparing it to the real-world equivalent: you'd have some kind of room full of stuff and you let a robot go in with a list of selection criteria. It examines things one by one to see if it matches those criteria, and if it does, it takes the thing with it, and brings it to you. How can you verify that the robot works well? Sure, you put some things in that room that match the criteria, and see if it fetches those things for you. However, you should also try dropping some things in the room that wouldn't match the criteria you gave to the robot, and verify that the robot doesn't take those things too.

For repository classes it should work the same way. Testing if some query in the repository class works, means that you should load some fixtures for records that would match the criteria. But you shouldn't forget to add some records that wouldn't match the criteria too. Otherwise, you wouldn't notice the difference between your elaborate SELECT query and a SELECT * WHERE 1 query. So basically you'd have to come up with examples, as well as counter-examples. It turns out that these counter-examples can be used to test-drive the query itself.

As an example, consider you need a query that will return a list of products, but only those which are active. You'd start with the simplest situation; a single product that should be returned:

INSERT INTO products (product_id) VALUES (1);

You then write the query for this:

$result = $this->connection
    ->createQuery()
    ->from('products')
    ->select('*')
    ->execute()
    ->fetchAll();

In your test you can then verify that this query indeed loads this one product. At this point I often add another record to the database, to prove that the query is capable of returning more than one object (i.e. there's no "limit" in place or anything).

INSERT INTO products (product_id) VALUES (1);
INSERT INTO products (product_id) VALUES (2);

Implement the first requirement

The query we wrote doesn't yet take the "is active" flag for products into consideration. So now, before you dive in and modify the query, you need to show that the code as it is doesn't yet implement all the requirements. So you add a counter-example; an inactive product:

- Active products
INSERT INTO products (product_id, is_active) VALUES (1, 1);
INSERT INTO products (product_id, is_active) VALUES (2, 1);

- Inactive product - should be ignored
INSERT INTO products (product_id, is_active) VALUES (3, 0);

A failing test

If you run the test again, it will fail, because it returns an extra product that wasn't expected to be there. This is the traditional "red" phase of TDD.

You then modify the query, adding a "where" clause that will exclude inactive products:

$result = $this->connection
    // ...    
    ->andWhere('active = 1')
    // ...
    ->fetchAll();

Run the test again, and the light will be green again.

Green; implement another requirement

Now you can continue working on the next requirement. In this example, we need the product to be in a group of products that's marked as "stock" products. With every extra requirement, we run into another variation we'd need to check. Consider a product that is active, but is not in the right kind of product group; we have to add a counter-example for that:

- Active products (1, 2)
INSERT INTO products (product_id, is_active) VALUES (1, 1);
INSERT INTO products (product_id, is_active) VALUES (2, 1);

- Inactive product (3) - should be ignored
INSERT INTO products (product_id, is_active) VALUES (3, 0);

- Active product, but in a non-stock product group (100) - should be ignored
INSERT INTO product_groups (product_group_id, has_stock_products) VALUES (100, 0);
INSERT INTO products (product_id, is_active, product_group_id) VALUES (4, 1, 100);

Running the tests will make the new product pop up of course, something we don't want, so we need to modify the query:

$result = $this->connection
    // ...    
    ->andWhere('active = 1')
    ->innerJoin(
        'products', 
        'product_groups',
        'product_groups.product_group_id = products.group_id'
    )
    ->andWhere('product_groups.has_stock_products = 1')
    // ...
    ->fetchAll();

This would seem to make it work, but when we re-run the tests now, the result is empty. Products 1 and 2 aren't in a product group yet, so the "inner join" will filter them out. So we have to modify the fixtures to fix this. The same goes for product 3 actually; we should put it in a stock product group, to verify that the "active" flag is still taken into account:

- Active products (1, 2), in a stock-product group (101)
INSERT INTO products (product_id, is_active, product_group_id) VALUES (1, 1, 101);
INSERT INTO products (product_id, is_active, product_group_id) VALUES (2, 1, 101);
INSERT INTO product_groups (product_group_id, stock_products) VALUES (101, 1);

- Inactive product (3), in a stock-product group (101) - should be ignored
INSERT INTO products (product_id, is_active, product_group_id) VALUES (3, 0, 101);

- Active product (4), but in a non-stock product group (100) - should be ignored
INSERT INTO product_groups (product_group_id, has_stock_products) VALUES (100, 0);
INSERT INTO products (product_id, is_active, product_group_id) VALUES (4, 1, 100);

And so on, and so on. For every new requirement, first add a counter-example to the fixtures, then see how it pops up in the query result. Then modify the query to ensure that it doesn't. This is the Red - Green - (Refactor) TDD cycle for queries in repository classes. By the way, I find it helpful to add important IDs or other characteristics as a comment to the fixtures, so it's easy for the reader to figure out what's so special about a certain record.

Helpful?

I find this approach very helpful. It helps you take small steps when working on repositories, where you may often feel insecure about getting the queries right (and understanding your ORM well). Test-driving your queries can prevent you from making stupid mistakes when loading records for specific owners (tenants, users). With this approach you can actually prove that you're loading the right things by adding some records owned by other tenants and users in your fixtures.

Just like with unit tests, it also helps make the code satisfy new requirements, whenever they become relevant. You have a good starting point for making amendments, a clear view on what's there, and a safety net in case you're worried that modifying the query in some way will accidentally result in unwanted records being loaded.

Potential problems

So far I've found that implementing more requirements can become a bit more tedious. You'd have to add more and more counter-examples. In particular if you also want to test how the repository deals with different data and data types, and you want to verify that it hydrates objects correctly.

Still, being able to safely write queries is something that I've wanted for a long time, and now that I can do it, I'm no longer worried as much if I see the correct data on my screen. More than once I've made the mistake of only testing repositories by clicking through the pages in the browser. Seeing anything at all on the screen was sufficient "proof" to me that the query I wrote was correct. Of course, in practice, it often turned out there was an awful mistake hidden in the query, and the data on the screen wasn't the right data at all. With this test-driven approach however, the test has already proven that the repository loads the correct records, nothing more, nothing less.

PHP design fixtures testing database Comments