Here are some rules I use for working with dynamic arrays. It's pretty much a Style Guide for Array Design, but it didn't feel right to add it to the Object Design Style Guide, because not every object-oriented language has dynamic arrays. The examples in this post are written in PHP, because PHP is pretty much Java (which might be familiar), but with dynamic arrays instead of built-in collection classes and interfaces.
Using arrays as lists
All elements should be of the same type
When using an array as a list (a collection of values with a particular order), every value should be of the same type:
$goodList = [
'a',
'b'
];
$badList = [
'a',
1
];
A generally accepted style for annotating the type of a list is: @var array<TypeOfElement>
.
Make sure not to add the type of the index (which would always be int
).
The index of each element should be ignored
PHP will automatically create new indexes for every element in the list (0, 1, 2, etc.) However, you shouldn't rely on those indexes, nor use them directly. The only properties of a list that clients should rely on is that it is iterable and countable.
So feel free to use foreach
and count()
, but don't use for
to loop over the elements in a list:
// Good loop:
foreach ($list as $element) {
}
// Bad loop (exposes the index of each element):
foreach ($list as $index => $element) {
}
// Also bad loop (the index of each element should not be used):
for ($i = 0; $i < count($list); $i++) {
}
(In PHP, the for
loop might not even work, because there may be indices missing in the list, and indices may be higher than the number of elements in the list.)
Instead of removing elements, use a filter
You may want to remove elements from a list by their index (unset()
), but instead of removing elements you should use array_filter()
to create a new list, without the unwanted elements.
Again, you shouldn't rely on the index of elements, so when using array_filter()
you shouldn't use the flag
parameter to filter elements based on the index, or even based on both the element and the index.
// Good filter:
array_filter(
$list,
function (string $element): bool {
return strlen($element) > 2;
}
);
// Bad filter (uses the index to filter elements as well)
array_filter(
$list,
function (int $index): bool {
return $index > 3;
},
ARRAY_FILTER_USE_KEY
);
// Bad filter (uses both the index and the element to filter elements)
array_filter(
$list,
function (string $element, int $index): bool {
return $index > 3 || $element === 'Include';
},
ARRAY_FILTER_USE_BOTH
);
Using arrays as maps
When keys are relevant and they are not indices (0, 1, 2, etc.). feel free to use an array as a map (a collection from which you can retrieve values by their unique key).
All the keys should be of the same type
The first rule for using arrays as maps is that all they keys in the array should be of the same type (most common are string
-type keys).
$goodMap = [
'foo' => 'bar',
'bar' => 'baz'
];
// Bad (uses different types of keys)
$badMap = [
'foo' => 'bar',
1 => 'baz'
];
All the values should be of the same type
The same goes for the values in a map: they should be of the same type.
$goodMap = [
'foo' => 'bar',
'bar' => 'baz'
];
// Bad (uses different types of values)
$badMap = [
'foo' => 'bar',
'bar' => 1
];
A generally accepted style for annotating the type of a map is: @var array<TypeOfKey, TypeOfValue>
.
Maps should remain private
Lists can safely be passed around from object to object, because of their simple characteristics. Any client can use it to loop over its elements, or count its elements, even if the list is empty. Maps are more difficult to work with, because clients may rely on keys that have no corresponding value. This means that in general, they should remain private to the object that manages them. Instead of allowing clients to access the internal map directly, offer getters (and maybe setters) to retrieve values. Throw exceptions if a value doesn't exist for the requested key. If however you can keep the map and its values entirely private, do so.
// Exposing a list is fine
/**
* @return array<User>
*/
public function allUsers(): array
{
// ...
}
// Exposing a map may be troublesome
/**
* @return array<string, User>
*/
public function usersById(): array
{
// ...
}
// Instead, offer a method to retrieve a value by its key
/**
* @throws UserNotFound
*/
public function userById(string $id): User
{
// ...
}
Use objects for maps that have multiple value types
When you want to use a map, but you want to store different types of values in it, use an object instead. Define a class, and add public, typed properties to it, or add a constructor and getters. Examples of objects like this are configuration objects, or command objects:
final class SillyRegisterUserCommand
{
public string $username;
public string $plainTextPassword;
public bool $wantsToReceiveSpam;
public int $answerToIAmNotARobotQuestion;
}
Exceptions to these rules
Sometimes libraries or frameworks require the use of arrays in more dynamic ways. In those cases it's impossible (and not desirable) to follow the previous rules. Examples are an array of data that's going to be stored in a database table, or Symfony form configuration.
Custom collection classes
Custom collection classes can be a very cool way to finally work with Iterator
, ArrayAccess
and friends, but I find most of the resulting code confusing.
Someone who looks at the code for the first time would have to look up the details in the PHP manual, even if they are experienced developers.
Also, you need to write more code, which you have to maintain (test, debug, etc.).
So in most cases I find that a simple array, with some proper type annotations, is quite sufficient.
What are strong signals that you need to wrap your array into a custom collection object after all?
- If you find that logic related to that array gets copied around.
- If you find that clients have to deal with too many details about what's inside the array.
Use a custom collection class to prevent duplicate logic
If multiple clients that work with the same array perform the same task (e.g. filter it, map it, reduce it, count it), you could remove that duplication by introducing a custom collection class. Moving the duplicated logic to a method on the collection class allows any client to perform the same task using a simple method call on that collection:
$names = [/* ... */];
// Found in several places:
$shortNames = array_filter(
$names,
function (string $element): bool {
return strlen($element) < 5;
}
);
// Turned into a custom collection class:
use Assert\Assert;
final class Names
{
/**
* @var array<string>
*/
private array $names;
public function __construct(array $names)
{
Assert::that()->allIsString($names);
$this->names = $names;
}
public function shortNames(): self
{
return new self(
array_filter(
$this->names,
function (string $element): bool {
return strlen($element) < 5;
}
)
);
}
}
$names = new Names([/* ... */]);
$shortNames = $names->shortNames();
An advantage of using a method for a transformation on a collection is that the transformation gets a name.
This enables you to add a short and meaningful label to an otherwise quite complicated-looking call to array_filter()
.
Use a custom collection class to decouple clients
If a client that works with a certain array loops over it, takes out a piece of data from selected elements and does something with that data, that client becomes tightly coupled to all the types involved: the array itself, the type of the elements that are in the array, the type of the values it retrieves from the selected elements, the type of the selector method, etc. The problem with this kind of deep coupling is that it becomes really hard to change anything about the types involved without breaking the client that depends on them. So in that case, you could also wrap the array in a custom collection class and let it give the right answer in one go, doing the necessary calculations inside, leaving the client more loosely coupled to the collection.
$lines = [];
$sum = 0;
foreach ($lines as $line) {
if ($line->isComment()) {
continue;
}
$sum += $line->quantity();
}
// Turned into a custom collection class:
final class Lines
{
public function totalQuantity(): int
{
$sum = 0;
foreach ($lines as $line) {
if ($line->isComment()) {
continue;
}
$sum += $line->quantity();
}
return $sum;
}
}
Some rules for custom collection classes
Let's take a look at some rules that I apply when working with custom collection classes.
Make them immutable
Existing references to a collection instance shouldn't be affected when you run some kind of transformation on them. Therefore, any method that performs a transformation should return a new instance of the class, just like we saw in the example above:
final class Names
{
/**
* @var array<string>
*/
private array $names;
public function __construct(array $names)
{
Assert::that()->allIsString($names);
$this->names = $names;
}
public function shortNames(): self
{
return new self(
/* ... */
);
}
}
Of course, if you're mapping the internal array, you may be mapping to a different type of collection, or a simple array. As always, make sure to provide a proper return type.
Offer only the behavior that actual clients need and use
Instead of extending from a generic collection library class, or implementing a generic filter, map, and reduce method on each custom collection class yourself, only implement what's actually needed. If at some point a method becomes unused, remove it.
Use IteratorAggregate and ArrayIterator to support iteration
If you work with PHP, instead of implementing all the methods of the Iterator
interface yourself (and keeping an internal pointer, etc.), just implement the IteratorAggregate
interface and let it return an ArrayIterator
instance based on the internal array:
final class Names implements IteratorAggregate
{
/**
* @var array<string>
*/
private array $names;
public function __construct(array $names)
{
Assert::that()->allIsString($names);
$this->names = $names;
}
public function getIterator(): Iterator
{
return new ArrayIterator($this->names);
}
}
$names = new Names([/* ... */]);
foreach ($names as $name) {
// ...
}
Consider the trade-off
Writing more code for your custom collection class should have the benefit of making it easier for clients to work with that collection (instead of with just an array). If client code becomes easier to understand, if the collection provides useful behaviors, then this justifies the extra cost of maintaining a custom collection class. However, because dynamic arrays are so easy to work with (mainly because you don't have to write out the involved types) I haven't often introduced my own collection classes. Still, I know some people who are great proponents of them, so I'll make sure to keep looking for potential use cases.