When and where to determine the ID of an entity

Posted on by Matthias Noback

This is a question that always pops up during my workshops: when and where to determine the ID of an entity? There are different answers, no best answer. Well, there are two best answers, but they apply to two different situations.

Auto-incrementing IDs, by the database

Traditionally, all you need for an entity to have an ID is to designate one integer column in the database as the primary key, and mark it as "auto-incrementing". So, once a new entity gets persisted as a record in the database (using your favorite ORM), it will get an ID. That is, the entity has no identity until it has been persisted. Even though this happens everywhere, and almost always; it's a bit weird, because:

  • The application logic now relies on some external system to determine the identity of an entity we create.
  • The entity will have no identity at first, meaning it can be created with an invalid (incomplete) state. This is not a desirable quality for an entity (for almost no object I'd say).

It's pretty annoying to have an entity without an identity, because you can't use its identity yet; you first have to wait. When I'm working with entities, I always like to generate domain events (plain objects), which contain relevant values or value objects. For example, when I create a Meetup entity (in fact, when I "schedule it"), I want to record an event about that. But at that point, I have no ID yet, so I actually can't even record that event.

namespace Domain;

final class Meetup
{
    public static function schedule(Name $name, /* ... */): Meetup
    {
        $meetup = new self();

        $meetup->recordThat(
            new MeetupScheduled(
                // we don't know the ID yet!
            )
        );

        // ...

        return $meetup;
    }
}

The only thing I can do is record the event later, outside the Meetup entity, when we finally have the ID. But that would be a bit sad, since we'd have to move the construction of the event object out of the entity, breaking up the entity's originally excellent encapsulation.

Determining uniqueness

The only way to solve this problem is to determine a new identity upfront. That way, the entity can be complete from the start, and it can record any event it likes, which likely includes the entity's ID itself. One way to do this is to use a UUID generator to generate a universally unique identifier. A first step would be to generate the ID inside the entity's constructor:

namespace Domain;

use Infrastructure\Uuid;

final class Meetup
{
    private $meetupId;

    // ...

    public static function schedule(Name $name, /* ... */): Meetup
    {
        $meetup = new self();

        $meetup->meetupId = MeetupId::fromString(Uuid::uuid4()->toString());

        // ...

        return $meetup;
    }
}

However, generating a UUID is a process that relies on the current date/time and some freshly generated random data. This process is something that - according to the rules explained in "Layers, ports & adapters - Part 2: Layers" - doesn't belong inside the domain layer. It's infrastructure. So the ID generation process itself should happen outside of the entity.

Besides, even though it's technically possible to generate a UUID inside an entity, it's something that conceptually isn't right. The idea behind an ID is that it's unique for the kind of thing it identifies. The entity is only aware of itself, and can never reach across its own object boundaries to find out if an ID it has generated is actually unique. That's why, at least conceptually, generating an identity should not happen inside the entity, only outside of it.

So a better approach would be to generate the ID before creating the new entity, and to pass it in as a constructor argument:

namespace Domain;

final class Meetup
{
    private $meetupId;

    // ...

    public static function schedule(MeetupId $meetupId, Name $name, /* ... */): Meetup
    {
        $meetup = new self();

        $meetup->meetupId = $meetupId;

        // ...

        return $meetup;
    }
}

$meetupId = MeetupId::fromString(Uuid::uuid4()->toString());
$meetup = Meetup::schedule($meetupId, ...);

Generate the ID in the application service

When you have a separate application layer (with application services, e.g. command handlers), you can generate the ID there. For example:

namespace Application;

use Domain\MeetupRepository;
use Domain\MeetupId;

final class ScheduleMeetupHandler
{
    private $meetupRepository;

    public function __construct(MeetupRepository $meetupRepository)
    {
        $this->meetupRepository = $meetupRepository;
    }

    public function handle(ScheduleMeetup $command): Meetup
    {
        $meetupId = MeetupId::fromString(Uuid::uuid4()->toString());
        $meetup = Meetup::schedule(
            $meetupId,
            // ...
        );

        $this->meetupRepository->add($meetup);

        return $meetup;
    }
}

Let the repository generate the next identity

However, at this point we still have the issue of generating the UUID being an infrastructure concern. It should move out of the application layer too. This is where you can use a handy suggestion I learned from Vaughn Vernon's book "Implementing Domain-Driven Design": let the repository "hand out" a new identity whenever you need it.

namespace Domain;

interface MeetupRepository
{
    public function add(Meetup $meetup): void;

    public function nextIdentity(): MeetupId;
}

We already have an implementation for this interface in the infrastructure layer, making the actual database calls, etc. so we can just conveniently implement the ID generation in that class:

namespace Infrastructure\Persistence;

final class MeetupSqlRepository implements MeetupRepository
{
    // ...

    public function nextIdentity(): MeetupId
    {
        return MeetupId::fromString((string)Uuid::uuid4());
    }
}

The code in the application service will look like this:

$meetupId = $this->meetupRepository->nextIdentity();

$meetup = Meetup::schedule(
    $meetupId,
    // ...
);

The advantages of letting the repository generate the next identity are:

  • There's a natural, conceptual relation: repositories manage the entities and their identities.
  • You can easily change the way an ID is being generated because the process is now properly encapsulated. No scattered calls to Uuid::uuid4(), but only calls to Repository::nextIdentity().
  • You can in fact still use an incremental ID if you like. You can use the database after all if it natively supports sequences. Or you can implement your own sequence. Maybe using Redis, but a regular relational database could be used too of course, e.g.
final class MeetupSqlRepository implements MeetupRepository
{
    // ...

    public function nextIdentity(): MeetupId
    {
        return MeetupId::fromInteger(
            $this->redis->incr('meetupId')
        );
    }
}

The only issue with using sequences is that there's no guarantee that every number in the sequence will actually be used (after all, we may request the next identity in the sequence, but not use it). But if that's not an issue, and you don't want to use UUIDs, this is a perfect solution for you. By the way, using Redis for this is just an example, you could also create a sequence like this with a good old relational database.

Use a value object for identities

Please note that encapsulating the identity generation process also means you need to encapsulate the identity itself. You already saw how I use a value object for the meetup identity. I do this for every entity identity.

An example of a value object that, in this case, wraps a UUID string:

namespace Domain;

use Assert\Assertion;

final class MeetupId
{
    private $id;

    private function __construct(string $id)
    {
        Assertion::uuid($id);

        $this->id = $id;
    }

    public static function fromString(string $id): MeetupId
    {
        return new self($id);
    }
}

Advantages of using a value object for IDs:

  • It hides the underlying data type of the ID.
  • This means you can switch to a different internal type, without doing shotgun surgery on the code base.
  • I think it's nicer to type against a particular ID class instead of just a string.

Feel free to de-duplicate some code in those ID value objects (e.g. of they are all UUID strings, I suggest introducing a trait for the methods you always have).

Generate the identity in the controller

We've now discussed the first of the "best" solutions: generating identity in the application service, but let the repository do the real work.

An alternative to this is to generate the identity in the controller (still letting the repository do the real work):

public function scheduleMeetupAction(): Response
{

    $command = new ScheduleMeetup();
    $command->id = (string)$this->repository->nextIdentity();
    // ...

    $this->scheduleMeetupHandler->handle($command);

    return $this->redirect('/meetup/' . $command->id);

This introduces an extra dependency in the controller, i.e. the repository, but you gain from this that you don't need the return value of the command handler. This could be something you just want (or maybe you can't even get anything from the command handler, because it hasn't been designed that way). Or maybe you need it because you handle the command asynchronously (i.e. you push it to some job queue). In that case it's very convenient to be able to use the ID already, before the command is handled, so you can keep track of its status. You can even send it back to the client and tell them the ID which they can later use to retrieve the result of handling the command (read "Returning From Command Buses" by Ross Tuck for a more detailed treatment of this discussion).

As you may understand from this list of preconditions, the first "best" solution is I think the "bestest" in most cases.

PHP DDD value objects entity identity
Comments
This website uses MailComments: you can send your comments to this post by email. Read more about MailComments, including suggestions for writing your comments (in HTML or Markdown).
Juan

Hello Matthias,

I also read Vaughn Vernon choices for generating ids. In my case I need a sequential pre-allocated id, generated by persistence. The db is mysql, so the sequence has to be emmulated with a table. The repository method that gets the next id has to increment it in an atomic operation, so the method updates the table too. I'm doing CQS but this method violates it. How do you deal with this?

CQS is not just violated at domain (repository method), but also at application layer, since there will be an app service that calls to the repository method. So this app service wouldnt be neither a command neither a query.

Thanks in advance.

PD: By now I use a workaround, which I dont like much, that I read at https://blog.ploeh.dk/2014/...

I use a uuid pre-allocated id, and the sequence would be another field that is populated after the entity has been persisted. In the repository I have a method to get the sequence value from the uuid, once the entity has been persisted.

Matthias Noback

Great question! I think it's totally fine to "violate" CQS in this case. You want to generate a new unique number, and want to make sure that no other caller would receive the same number. This means the "query" aspect of the method will only give you this exact answer once. The "command" aspect of the method is needed to produce the side-effect that ensure the "query" doesn't give the same answer twice.
The same goes for the application service. I think that it's totally fine for the application service to return the new ID. If you really don't want that, you can always generate the ID before calling the application service. That would keep the CQS violation out. Generally, I think that it overshoots the goal.

Juan

Hi Matthias, thanks for the answer.

"I think that it's totally fine for the application service to return the new ID. If you really don't want that, you can always generate the ID before calling the application service."

That's what I do. The app service I mentioned is just a call to the repository method that "violates" CQS. First the client calls this app service to get the next id. Then the client creates a command (dto) containing the pre generated id, so that the "command handler" (i.e. another app service) creates an instance of the entity with the given pre-generated id, and persist the entity. The first app service, which just get the next id, and the command handler run in different transactions.

The point is that the "get next id app service", isnt a pure command neither a pure query. So it cannot be run as a command in the command bus, neither can be run as query inside a readonly transaction. So it would be apart from command and queries.

Maybe I'm too purist, and it's ok to let this app service "violates" CQS.

Going one step beyond, this approach has the drawback of going to the db each time I need an id. I think that I will implement it caching some values of the sequence, selecting a "not too big" cache size, so that I can live with the gaps.

Thanks for yuor attention.
Juan.

Matthias Noback

Yeah, you could definitely let go of CQS in these scenarios. Caching doesn't seem to me very useful in a case where concurrent requests for the next ID shouldn't give the same answer.

Juan

I don't understand what you mean Matthias. The next id shouldn't be the same in any case. Not just in this case. Anyway whst does it have to do with caching?

Matthias Noback

Sorry, I probably got it wrong. I thought you meant caching part of the sequence in the running app. But this comes with the risk of handing out duplicate IDs.
Honestly, I don't think that hitting he DB twice for every new entity is a problem; in most cases that is. If you really need a super fast application, you should use UUIDs. You can always assign an incrementing ID later on if you like.

Juan

I know what you mesn. I think you are right. Thanks for your opinion

Matthias Noback

Cool, don't hesitate to continue the discussion here!

Trong Phan

Hello,
Still bulling my hair with this topic. I have this concern. When we allow to create entity by passing id, and other attributes. Developers can make the mistake below:

var p = new Person();
p.Id = 1;
p.Name = "Test 2";

var db = new StoreDBContext();

db.People.Attach(p);
p.Name = "new 1";
db.SaveChanges();

Developers mean to create new person, but the code actually will update the current person. Since we allow to set Id when initial a new Person.

This mistake will never happen if we not allow client to initial person with Id as parameter...

I think this is tradeoff that we have to live with or do you guy have any thoughts?
Thanks,

Matthias Noback

The solution has two parts:

1. Don't allow the ID to be set/modified/changed. The only moment you can provide an ID as a regular client is when you create the entity object for the first time.
2. Keep track of which entities have already been saved to the database. You can do this in your repository implementation (using an identity map to keep track of the entities that are known to the repository), or you can have something like an `isNew()` method on the entity which the repository implementation can use to decide if it should create a new record or update one.

Trong Phan

what about:
var p = new Person(2, "Test 2");
2 is id generated by Repository. But my point is the developers may make mistake by providing a hard code Id .....during development ..... and somehow go to production for example.

So using Guid is one of other choice to handle ID ...
What is your thought on this one?
Thanks again,

Matthias Noback

You can always make a mistake that ends up in production ;) I prefer having a repository with a `nextIdentifier()` method, which will provide the next ID (instead of asking the developer to type in the ID or something). This can be an incremental ID or a UUID; that doesn't even matter.

Daniel Richter

Late to the party, but wanted to share one more option that I sometimes use, which is a bit of a mixed style and works well in practice.

In order to have database-generated auto-increment PKs and still record domain events on newly created entities, I record a "LazyId" object, which wraps the entity.
```
$this->recordEvent(
new SomethingCreatedEvent(
new LazyId($this)
)
);
```
During persistence (in my case Doctrine, but should work with anything) I first persist the entities, and then the events (in same transaction of course), so by the point the entities have their Ids established I resolve the LazyIds to their respective actual Ids. This only requires one event listener.

Matthias Noback

Thanks for sharing!

Dmitri Lakachauskis

Regarding IDs as value objects:

> This means you can switch to a different internal type, without doing shotgun surgery on the code base.

I think this argument is not entirely true. If you switch to Integer from a String and implement asInteger() / fromInteger() methods, then you have to change the corresponding Entity too (in case you follow "store everything as scalar" principle) e.g.

---

/**
* @var int (was a string)
*/
private $id;

public function __construct(EntityId $id)
{
$this>id = $id->asInteger(); // was $id->asString();
}

public function id(): EntityId
{
return EntityId::fromInteger($this->id); // was EntityId::fromString($this->id);
}

---

IMO the beauty of IDs as value objects is overestimated. Let's say I have a Locale entity (for whatever reason).

---
class Locale
{
public function __construct(string $code, string $name)
{
$this->code = $code;
...
}

public function code(): string
{
return $this->code;
}

...
}

interface LocaleRepository
{
public function byCode(string $code): ?Locale;
}
---

Why would I need to convert a scalar $code to LocaleId? The notion of the name "code" clearly denotes what it is (as well as its uniqueness). Moreover other aggregates have a meaningful "link" (not just a LocaleId) as the locale's code is useful on its own.

IOW I'm not entirely convinced every entity should have <entity>Id attached to it. Sometimes scalar id is just fine.

Matthias Noback

I see what you mean, thanks for pointing that out. However, I still like the VO to be a simple guard against invalid values. Also, the type-casting methods are "internal" methods, so they still won't result in "shotgun surgery" (like, all over the code base); you still have to make changes of course, but that's nothing special.

Dmitri Lakachauskis

I agree. Will use VO as IDs in my side project. Will see how it goes.

Matthias Noback

Cool, please let us know how it went!

Tibor Soviš

Please check your code under "Generate the ID in the application service" heading, there are two calls of MeatupId::fromString method. First create $meatupId, which is not used, second use $command->id, which correspond to last example.

Matthias Noback

Good point, thanks; it'll be fixed in the next iteration (tomorrow).

Samir Boulil

How about user generated IDs? Let's say a user gets to choose a unique identifier for a meetup, is it possible to directly use this value or will it hurt down the road ?

Matthias Noback

I think for the system it's best to use an ID generated by the application. Then again, I didn't come across a situation where I was tempted to use the user-provided ID. Do you have an example?

Samir Boulil

Let's just say in our usecase of the meetups that the user gets to define a title and a sanitize version of the title which would be the id.
- Now sometimes, the Id end up in urls "/meetups/{user_id}" so that's nice since it will have a user nicely formatted url.
- And it's not an issue since the ID is "generated" outside the domain and the application layer.
- It avoids having two Ids, one generated in the repository for instance and another one a user would choose for its entity.

This is a question that has been going on for quite some time in my head :). Why would we need to maintain two Ids (one of the user, one generated by our system) when we can just use the one generated by the user ?

Matthias Noback

Yeah, that sounds like a use case for it, however, keep in mind that you can't just change that user-generated ID anymore. Also, aggregates refer to each other by their ID. Maybe you won't like to use this user-generate ID for that too.

Erik

maybe outside the scope, but could you explain the choice to make almost or every class final? I too like to do that since it prevents massive chaining , protects the inside and keeps the internals refactorable without having to modify any children extending from it. But i see most people around me using mostly the protected scope for almost everything (very bad i know). But its hard to convince them

Matthias Noback

Totally get that; I find this a good reference article: https://ocramius.github.io/... The main motto is: extending a class is a very specific use case of that class. If you don't explicitly want to offer this use case, as the developer of this class (or if there's no reason to offer it yet), don't allow it. Also, the general advice is to prefer "composition over inheritance".

Vasiliy Pyatykh

@matthiasnoback:disqus What about mocking a class in a Unit Test? Usually it is impossble to mock a 'final' class, which brings unnessesary problems to unit testing.

Julian Li

The article above suggests:
"Final classes only work effectively under following assumptions:
There is an abstraction (interface) that the final class implements
All of the public API of the final class is part of that interface"

In terms of tests and mock, you mock the interface, not the final class

Matthias Noback

Indeed; it means that for objects with state, you'll be using the "real thing" (i.e. don't mock entities, value objects, etc.). For services with side-effects (e.g. a repository) you should always have an interface for which you can still easily create a test double.

Quentin P.

It is a good article, thanks you !

When you say generating the identity inside the entity is not right, you can add it is difficult to test your entity and how you are suppose to use it.

Matthias Noback

That's right, nice addition.

Luís Cobucci

Great points! I really dislike using the repository to handle that operation, though. I see repositories as in-memory collections, so it feels wrong to me.

Nowadays I'm splitting the generation of a uuid and value object id in two. The first is handled in the controller (https://github.com/chimerap..., the other in the command. Works great for me =)

Matthias Noback

I understand your point. The most important part is indeed separating the generation of a random string from the ID value object itself.

To me it makes a lot of sense to let the repository-as-a-collection provide the ID; it's like when you hand out your coat in a theater and you receive a ticket (with an ID) to later retrieve your coat again. In that metaphor, you can't bring your own ID either ;) Also, although I often see the repository as a collection too, there's a limit to that, since you definitely want that collection to have a lasting effect too (i.e. a side effect), when calling save() on it.

Luís Cobucci

> To me it makes a lot of sense to let the repository-as-a-collection provide the ID; it's like when you hand out your coat in a theater and you receive a ticket (with an ID) to later retrieve your coat again. In that metaphor, you can't bring your own ID either ;)

We are actually doing something different, though... we're saying "hey, give me an ID and don't worry about what I'll do with it" and then "now please add this coat, that MIGHT use the ID you gave me". That's what makes me uncomfortable.

> Also, although I often see the repository as a collection too, there's a limit to that, since you definitely want that collection to have a lasting effect too (i.e. a side effect), when calling save() on it.

My take on this is that, considering that we handle things as it was in-memory, we just add items to the repo, so the interface should be `add()` not `save()`. And when modifying an object, we should consider that things gets saved automatically - again everything is in-memory.

That's fairly simple to achieve by using service bus middleware to commit the unit of work. With the default tracking policy of Doctrine ORM you would call `EM#persist()` in `Repository#add()` and `EM#flush()` in the middleware, which would handle the aggregate additions and modifications for you ;)

Matthias Noback

Thanks for these challenging comments!

I agree that the metaphor doesn't fit completely hehe. To be more clear, it's about the fact that the repository at least knows what ID would be good to use. As an example, something I mentioned in the article as well: an ID may just as well be a number in a sequence, not a UUID. It would be a good idea to hide the generation of the ID behind some interface (which could be the repository interface, but it doesn't have to be) and not assume that the client can do something like `new ProductId($uniqueId)`.

I didn't think about the possibility that the client of the repository wouldn't use the ID generated for it, that's interesting point. But this is also the case in your example code, and I think in pretty much any situation; same for the sequence example: you may not use the next identifier in that case, and the sequence will have gaps. The only way you can guarantee no unused identifiers is to let the repository inject the ID afterwards, which we were trying to prevent in the first place :)

About middleware handling the flush; I built such a thing for SimpleBus too, but I'm no longer using that. Since persisting an object is an important step of the use case of creating/updating something, I want it to be an explicit statement inside an application service. Also, I'd want to force the changes to one aggregate to be persisted within one database transaction, and not risk multiple changes to multiple entities being flushed after my application service/command handler finishes.

Luís Cobucci

You're correct regarding the usage also in the command/handler side, we can generate something in the controller and simply ignore it.

I usually like to think that DB persistence is just a consequence of manipulating the objects, which helps me a lot to only think about objects and their behaviour.

I handle the transaction isolation by ensuring that I only execute one command per DB transaction (using Tactician's lock middleware) and only modifying one aggregate in the handler. That requires effort from implementors and reviewers but it surely makes things simpler.

Anyway, thanks for the discussion, it's always good to explore different perspectives =)

Matthias Noback

Thanks for describing your ideas in such great detail here; it'll be very interesting for people to read about these other options.

Patrick

Looks like my comment got deleted because it contained a link... Here it is without:

$meetupId = MeetupId::fromString(Uuid::uuid4()->toString());

I think you should move that into the object.

$meetupId = MeetupId::generate();

Matthias Noback

Thanks for your suggestion; it usually makes sense to hide this kind of thing inside the object, but as I mentioned, UUID generation is an infrastructure concern, and as such, should not end up in a domain object.

Patrick
 $meetupId = MeetupId::fromString(Uuid::uuid4()->toString());

I think you should move that into the object.

 $meetupId = MeetupId::generate();

I talked about this idea in my newest blog post Tell, don't ask.