CQRS - not a complicated thing
CQRS has some reputation issues. Mainly, people will feel that it's too complicated to apply in their current projects. It will often be considered over-engineering. I think CQRS is simply misunderstood, which is the reason many people will not choose it as a design technique. One of the common misconceptions is that CQRS always goes together with event sourcing, which is indeed more costly and risky to implement.
CQRS alone simply means that you're making a distinction between a model that is used for changing state, and a model that is used for querying state. In fact, there's often one model that accepts "write" operations (called "write model" or "command model") and multiple models that can be used to "read" information from (called "read models", or "query models").
Most projects out there don't use CQRS, since they combine the write and read operations in one model. Whether you use the data mapper or active record pattern, you'll often have one object ("entity") for each domain concept. This object can be created, it has methods that allow for state modifications, and it has methods that will give you information about the object's state.
All the legacy projects I've encountered so far use this style of storing and retrieving state. It comes with a certain programming style that is not quite beneficial for the maintainability of the application. When writing a new feature for such an application you will start with getting all the ingredients in place. You need some information, and you need some dependencies. If you're unlucky, you still fetch your dependencies from some global static place like good old Zend_Registry
or sfContext
. Equally bad, you'll fetch your information from the central database. This database contains tables with dozens of columns. It's the single source of truth. "Where is this piece of information? Ah, in table XYZ
". So now you use the ORM to fetch a record and automatically turn it into a useful entity for you.
Except... the entity you get isn't useful at all, since it doesn't give you the information you need — it doesn't answer your specific question. It gives you either not enough or way too much information. Sometimes your entity comes with a nifty feature to load more entities (e.g. XYZ->getABCs()
), which may help you collect some more information. But that will issue another database query, and again, will load not enough or way more than you need. And so on, and so on.
Reduce the level of coupling
You should realize that by fetching all this information, you're introducing coupling issues to your code. By loading all these classes, by using all these methods, by relying on all these fields, you're increasing the contact surface of your code with the rest of the application. It's really the same issue as with fetching dependencies, instead of having them injected. You're reaching out, and you start relying on parts of the application you shouldn't even be worrying about. These coupling issues will fly back to you in a couple of years, when you want to replace or upgrade dependencies, and have to make changes everywhere. Start out with dependency injection and this will be much easier for you.
The same goes for your model. If multiple parts of the application start relying on these entities, it will be more and more difficult to change them, to let the model evolve. This is reminiscent of the "Stable dependencies principle" (one of the "Package design principles"): if a lot of packages depend on a package, this package becomes hard to change, because a change will break all those dependents. If multiple clients depend on a number of entities, these will be very hard to change too, because you will break all the clients.
Being hard to change is not good for your model. The model should be adaptable in the first place, since it represents a core concept of the domain you're working in. And this domain is by nature something that changes, and evolves. You want to be able to improve the quality and usefulness of your domain model whenever necessary. Hence, it's a good idea not to let all those clients depend on one and the same model.
The solution is to introduce multiple models, for each of those clients. Exactly as CQRS proposes; one write model, which you can use to change the state of your domain objects. Then multiple read models, one for each client. That way, there will be less coupling, and it will be easy to evolve the model in any direction. You can safely add, remove, or transform any piece of information in any one of those read models.
A recipe for read models
To break with a habit of retrieving all the information from one model, you can follow these steps. You'll end up with a nice set of read models, one for every question a client may have.
- Whenever you feel the need to fetch some information, or get an answer to a specific question, and you're tempted to go to the repository and load one or more entities (and related entities), stop and think:
Rephrase the answer you're looking for as an object. For example, let's say the question is: "Which products did the customer order so far? I need the ID and name of each product, and the date on which the customer ordered it. A class for such an answer would look like:
final class ProductTheCustomerOrdered { private $productId; private $productName; private $dateOfOrder; public function __construct(int $productId, string $productName, string $dateOfOrder) { ... } }
Define an interface with a single method, which phrases your question, and shows what type of answer you're looking for:
interface OrderHistory { /** * @return ProductTheCustomerOrdered[] */ public function productsTheCustomerOrdered(CustomerId $customerId); }
Rely on this interface in your code (through constructor injection of course).
final class SomeService { private $orderHistory; public function __construct(OrderHistory $orderHistory) { $this->orderHistory = $orderHistory; } public function someMethod(CustomerId $customerId) { $products = $this->orderHistory->productsTheCustomerOrdered($customerId); // do something with $products } }
Provide an implementation for the interface, which gets the right data from the database, then returns those objects.
final class OrderHistorySql implements OrderHistory { public function productsTheCustomerOrdered(CustomerId $customerId) { // perform some smart query, fetching only what you need $records = ...; return array_map(function($record) { return new ProductTheCustomerOrdered(...); }, $records); } }
It would be smart to provide an integration test for the implementation class, proving that your assumptions about any third-party code involved in fetching the data and creating the objects are correct.
The ProductTheCustomerOrdered
object itself can become an attractor of behavior; you can add useful methods to it, that help gain more insight in the data, or that protect you from using the data in the wrong way (basic object encapsulation, in other words).
Dealing with inconsistent data
A common issue with those large legacy databases is that data is simply inconsistent. A particular field in the database may contain either a date or null
, or 1970-01-01 00:00:00
. And on top of that, the date has a time stamp, which isn't really needed. You can smooth out all of these little inconsistencies, all this craziness, inside the implementation class. As long as, in our example, productsTheCustomerOrdered()
returns nice, well-behaving ProductTheCustomerOrdered
objects, all is good.
The read model and its repository form an Anti-Corruption Layer between the new client code, and the old legacy data. The read model protects the client code from having to deal with all the quirks of the legacy data and code. For more information on this DDD concept, take a look at Eric Evan's article "Getting started with DDD when surrounded by legacy systems" (PDF)
Conclusion
Considering how simple the steps described above really are, CQRS (without event sourcing) is certainly within reach - in any project, no matter how crappy or old the legacy is. You might still consider using event sourcing as well, for one or more write models. That would allow you to further optimize the performance of your read models. Instead of assembling the answer using some smart (and complicated, maybe slow) query, you can build up your read model over time and let it already have the answer.
Thanks for this nice article Matthias.
I have a quick question: Once I have separated the read-model from the write-model, how do I deal with read-only queries that calculate their values with logic that is heavily domain-related? They cannot be implemented in the read-model, because they need the domain logic for calculation, but they cannot be part of the write-model either, because they don't change or persist any state, but act merely as service functions for the front-end.
What would you suggest? I have also phrased this question as a SO question in more detail:
<a href="https://stackoverflow.com/questions/60334923/how-to-expose-read-only-calculations-from-business-domain-model-on-back-end-to-f">https://stackoverflow.com/questions/60334923/how-to-expose-read-only-calculations-from-business-domain-model-on-back-end-to-f</a>
but haven't gotten an answer, yet.
---------------------------------------------------------------------------------------------------------------------------------BERTHOLD TECHNOLOGIES GmbH & Co.KG, Calmbacher Str. 22, 75323 Bad Wildbad, Germany
Registergericht / Court of Registration: Stuttgart HRA 330991
Persoenlich haftende Gesellschafterin / Fully liable Associates: BERTHOLD TECHNOLOGIES Verwaltungs-GmbH
Registergericht / Court of Registration: Stuttgart HRB 331520
Geschaeftsfuehrung / Management: Andreas Dobratz
VAT Reg.No.: DE813050511
WEEE-Reg.No.: DE99468690
Impressum / Legal notice: https://www.berthold.com/en/berthold/legal-notice
The information contained in this message is privileged and intended only for the recipient names.
If the reader is not a representative of the intended recipient, any review, dissemination
or copying of this message or the information it contains is prohibited. If you have received this message in error,
please immediately notify the sender, and delete the original message and attachments.
How do you deal with a read model varinats ?
Let's imagine OrderHistory, FeaturedProducts, PopularProducts, LastOpenedProducts, WhateverProductList. The second may need an image preview and a short abstract in addition. The third - with detailed description and last X comments, variations are infinite. In result every view will require it's own single-method-reposiotry (no sure about "single") and a model class. Repositories will duplicate table and field names, at minimum. Models will duplicate most getters and simple logic. Obviously, read models and repos should not extend from a base class, as view should be easily disposable along with it's repo and model.
I have no immediate answer for you. I agree about the disposability rule. However, if you see lots of (technical) duplication, then you might want to throw some traits in the mix.
Matthias, thank you for this article. What is your choice for instantiating such kind of Read models? Do you build them manually from raw database data (using PDO for example) or use some ligthweigth ORM?
Raw database data may be the best way to go. An ORM can sometimes be useful, but all too soon will do more than you need.
Good read! I do CQRS in some of my projects and I really like it. I use Command Bus for "C" and queries with filters for "Q". You shared your take on queries. Did you or do you plan to write some article to show how you approach commands and modifying the state of your data? Thanks!
Thanks. I have some older articles on these (check out the archive for "Command bus"). Maybe these are helpful.
Good article. I do feel the jury is still out on the long-term effects of this. Like proliferation of read-models making the schema harder to change, or overstepping (bypassing) boundaries because every new read-model is a new blank slate and potentially invitation to query data you should not access directly. But then again, those problems are not bigger than the alternative of convoluted read/write behaviour or law-of-demeter violations that the orm invites...
I do like the angle of "CQRS is not as hard as you might think"!
I've thought about this a bit. I'm working hard at bringing all database access for particular type of entity into one location, pulling in queries from dozens of locations.
On the surface, this article would imply multiple classes. However, a closer reading shows Matthias says each read method is defined in it's own *interface*. That means a single class can implement several interfaces. That lets you put related code together. Users declare a dependency on a single interface.
The only problem with this way of implementing Matthias' suggestion, in PHP at least, is the fact that a user of an interface is not strictly restricted to the method declared by that interface. So if you use this approach, you have to be careful to not use more methods than the interface declares.
You're right @redbeardcreator:disqus, it's certainly possible to let one class implement multiple interfaces. If it helps keeping some logic in the same place, that could be okay. In practice I often find that simply having all the implementation classes in one directory is enough to keep things manageable.
You’re not restricted from accessing implementation details beyond the interface in most languages that support the instanceof operator and typecast - if you have to do that, or the equivalent in PHP, yeah, that most likely points at bad or missing design. For PHP, the Storm IDE, and plenty of offline QA tools, will catch that sort of mischief.
Indeed, I wouldn't worry about abuse in this case. You can do so many bad things anyway ;)