In essence, everything is a string.
Well, you can always go one layer deeper and find out what a string really is, but for web apps I work on, both input data and output data are strings. The input is an HTTP request, which is a plain-text message that gets passed to the web server, the PHP server, the framework, and finally a user-land controller. The output is an HTTP response, which is also a plain-text message that gets passed to the client. If my app needs the database to load or store some data, that data too is in its initial form a string. It needs to be deserialized into objects to do something and later be serialized into strings so we can store the results.
We create objects from strings, and turn them back into strings because the protocols we use require strings (e.g. HTTP, SQL, AMQP, and so on). These protocols are only used near the edges of the application, where data comes in from and gets sent to external systems. In the core of the application there should be no need to serialize/deserialize data. There we should only have to deal with objects. That will be great, because objects can provide guarantees regarding the data they keep, so they are safer to use than strings. They also have an explicitly defined API, so they are much easier to use.
Of course many developers know this. They'll use Value Objects to wrap strings, enforcing data consistency and ease of use. And not just strings, because we have several other primitive types at our disposal that support different use cases like doing math.
The problem is, how do we safely go from a string to an integer. To complicate things, most string data gets to us in the form of an associative array (i.e. map) of key/value pairs, both of which are a string. For instance when we get a record from our database abstraction library, it will be an array. If we want to use that data we can access it by its key, but we have to ensure it's there. The next step is to ensure it's of the correct type, and optionally cast it to the correct type:
/** @var array $record */
$title = $record['title'];
$numberOfComments = (int)$record['numberOfComments'];
From the type signature if $record
, it's not clear that we may expect keys title
and numberOfComments
to exist. Even if they exist we can't be sure that their values are of the expected type. When working with arrays you always have to check if the key exists before accessing it, or you may get a PHP Notice (and hopefully to the error that it really is, but most frameworks nowadays do this for you). We can use the so-called null coalescing operator (??
) to overcome the problem of undefined keys:
/** @var array $record */
$title = $record['title'] ?? '';
$numberOfComments = (int)($record['numberOfComments'] ?? 0);
This works if the key is undefined, but it will also revert to the default value of they key did exist but the value was null
. We lose an important piece of information, namely that the requested key is undefined. In most cases this is a programming mistake, e.g. we forgot to add the column to the SQL SELECT
statement. When using ??
it's a lot harder to discover this problem because it "swallows" the problem.
Instead we should explicitly assert that the key exists:
/** @var array $record */
if (!array_key_exists($record, 'title')) {
throw new LogicException('Expected array $record to have a key "title"');
}
$title = $record['title'];
if (!array_key_exists($record, 'numberOfComments')) {
throw new LogicException('Expected array $record to have a key "numberOfComments"');
}
$numberOfComments = (int)$record['numberOfComments'];
Of course, this quickly becomes annoying. So we introduce a helper function for this, e.g.
/** @var array $record */
self::assertKeyExists($record, 'title');
$title = $record['title'];
self::assertKeyExists($record, 'numberOfComments');
$numberOfComments = (int)($record['numberOfComments'] ?? 0);
The helper function throws that same exception if the key is undefined.
But couldn't we just define the expected shape of $record
, thereby fixing the issue? E.g.
/** @var array{title: string, numberOfComments: string} $record */
Not really, because it isn't this method that defines the structure of $record
. It's the result of fetching a result set from the database, and that doesn't give us any guarantees about the shape of the array, or the types of the values. There's a slightly better type we can use though:
/** @var array<string,string|null> $record */
$record
can be trusted to be an array with string
keys. The values can be string
or null
. Although it's an honest type, it means we still have to fix some times in the mapping code. E.g. do we want to assume that title
is never null
? Maybe we know this because we have the column defined as NOT NULL
. Even if we know this, in our code it can technically speaking still be null
. So we still have to add an assertion to our code and get rid of the possible null
value:
/** @var array<string,string|null> $record */
self::assertKeyExists($record, 'title');
if ($record['title'] === null) {
throw new LogicException('Expected $record[\'title\'] to be a string');
}
$title = $record['title'];
// Now we know that `$title` is a string, it can't be `null`
This may become repetitive as well, so we introduce another helper function:
/** @var array<string,string|null> $record */
self::assertKeyExists($record, 'title');
self::assertNotNull($record['title']);
$title = $record['title'];
A similar thing can be done for the numberOfComments
key, but its value could still be null
in case there are no comments. If there are comments, the value will be a string
but it will look like an int
so it should be possible to parse it to an int
either by using parse_int()
or by casting to an int
(using (int)
):
self::assertKeyExists($record, 'numberOfComments');
$numberOfComments = $record['numberOfComments'] === null ? 0 : (int)$record['numberOfComments'];
Most of this mapping code still reads and feels like a step-by-step instruction manual. First do 1, then 2, etc. While in the end we just want one thing. We want to get a string or an int value from an array by a given key. Let's rephrase this with some new helper functions:
/** @var array<string,string|null> $record */
$title = self::getString($record, 'title');
$numberOfComments = self::getInt($record, 'numberOfComments');
Some may say: OMG helper functions!!! Do you want me to create a Util
class for this :P ??? Well, if you like. No problem. After all, these are utility functions. However, I generally put them inside a trait
and call it, for example, Mapping
:
trait Mapping
{
private static function getString(array $data, string $key): string
{
// ...
}
private static function getStringOrNull(array $data, string $key): ?string
{
// ...
}
private static function getInt(array $data, string $key): int
{
// ...
}
private static function getIntOrNull(array $data, string $key): ?int
{
// ...
}
}
Using a trait for this is nice, because it doesn't add methods to the public interface of the class where you use the trait
. It also doesn't require a change to the type hierarchy (e.g. you could also put this in an abstract class).
Such a trait can be used in mapping functions for:
- database records to entities
- HTTP request data to DTOs
- HTTP response data to DTOs
It gives you as a developer good feedback when a key is missing or a value is not of the expected type. These problems will no longer go unnoticed because the helper functions throw exceptions. They also increase type coverage for your code base because types are introduced in a very explicit way.
The only question I still have about this: is this material for open source software? I'm not sure because so far every project still has its own requirements that slightly deviate. E.g. you may want to use different mapping tactics for request data from your own website than for response data from an external API. You may want to recover from undefined fields, or resort to default values if a key contains null
. We'll see how things evolve!
When going on a PHP coercions deep dive in April/May of this year (and suggesting a PHP RFC for less tolerant boolean coercions) I made a small library for explicit coercions (https://github.com/squirrelphp/types) to have a more consistent way of enforcing types without explicit casts or repeating similar code everywhere. Before I was using explicit casts like (int) or (bool) quite a bit, but this can easily hide bad values you really do not want to blindly coerce. Since exchanging most casts with explicit coercions in my projects I did find some bugs that were lurking beneath these explicit casts - no critical errors yet, but slight bugs that can develop into unexpected situations. Especially when getting a value from a database you rarely want to convert -1 to a boolean true, for example.