Combing legacy code string by string

Posted on by Matthias Noback

I find it very curious that legacy (PHP) code often has the following characteristics:

  1. Classes with the name of a central domain concept have grown too large.
  2. Methods in these classes have become very generic.

Classes grow too large

I think the following happened:

The original developers tried to capture the domain logic in these classes. They implemented it based on what they knew at the time. Other developers, who worked on the code later, had to implement new features, or modify domain logic, because, well, things change. Also, because we need more things.

For instance, when a developer comes in and has to modify some of the logic related to "sales orders", they will look for classes with that name in it, like the SalesOrder entity, the SalesOrders controller, service, manager, helper, table gateway, repository, etc. Before considering the option to add more classes, most developers will first consider adding a method to one of the existing classes. Especially if the code of this class hasn't been tested.

Since this is legacy code we're talking about, it's very likely that the code hasn't been tested (if it was tested, would it be legacy code?). So most likely they will end up modifying the existing class, instead of thinking about what they really need: an object that does exactly what they need.

This is how legacy classes end up being so very large. We add more and more to it (since the new functionality is "related" to the existing code after all). It explains why legacy code has the first characteristic I mentioned at the beginning of this post ("Classes with the name of a central domain concept have grown too large."). The second characteristic still deserves some explanation though ("Methods in these classes have become very generic.").

Methods become very generic

Once we decide to take an existing class and modify it, we first analyze the code that's in the class: the method, the properties; we've all learned to take the D.R.Y. ("Don't repeat yourself") principle into account, so we're scared of redoing anything that was already done. While scanning the existing code, we may find several of the ingredients we need for the task at hand. Well, the ingredients are often somewhat useful, but not entirely what we need.

This is a crucial moment, a moment we've often experienced, a moment we've always appreciated. However, it's a moment we should fear, a moment we should fight (pardon the dramatic tone). It's the moment we change a method to fit in with our current use case.

How is this usually achieved? Most often we use one of the following tactics:

  1. Add an extra parameter (a.k.a. a parameter flag), using which we can influence the behavior of the method. Of course we provide a sensible default parameter, for backwards compatibility:

    // The original method:
    public function findAllSalesOrders()
    {
        // some complicated SQL, *almost* what we need
    }
    
    // The modified method, with a parameter flag  
    public function findAllSalesOrders($excludeFinishedOrders = false)
    {
        // ...
    }
    
  2. Call another method first (a.k.a. decoration). This method may do some of the work, and maybe let the original method do its original work afterwards.

    // The original method:
    public function findAllSalesOrders()
    {
        // some complicated SQL, *almost* what we need
    }
    
    // The new method, which decorates the original one:
    public function findOpenSalesOrders()
    {
        // call the original method
        $salesOrders = $this->findAllSalesOrders();
    
        return array_filter($salesOrders, function(SalesOrder $order) {
            return $order->isOpen();
        };
    }
    

Sometimes we even combine the two, making one method more generic, and calling it from a more specific one:

// The original method:
public function findAllSalesOrders()
{
    // some complicated SQL, *almost* what we need
}

// The modified method:
public function findSpecificSalesOrders(array $statuses = [])
{
    // some complicated SQL, can do anything we need
}

// The new method, which decorates the modified original one
public function findOpenSalesOrders()
{
    return $this->findSpecificSalesOrders([SalesOrder::STATUS_OPEN]);
}

Over time, the original methods become more generic. That is, they can be used in many different scenarios. At the same time, the classes containing these methods keep growing. Every time we need a slightly different method, we add it, and use one of the tactics described. Slowly the class becomes too large to remain manageable (see also "Keep an eye on the churn; finding legacy code monsters").

If this goes on long enough, we end up with true spaghetti code. If you visualize the execution of the code as a string travelling through your code base, having all these specialized methods using generic, reusable ones, looks something like this:

Entangled strings

A very healthy way of untangling legacy code like this, is to take a step back. I like how Stijn Vannieuwenhuyse described this in a tweet:

He hints at using the Inline Method refactoring multiple times, to collapse the hierarchy of method calls. For example, given the findOpenSalesOrders() method described above, we don't call another method (findAllSalesOrders()), but we copy all the code, (including the complicated SQL) from this method into the new method. Preferably we move this method to a new class, since we want to use this opportunity to make the original class a bit smaller.

While inlining the findAllSalesOrders() method inside the new findOpenSalesOrders() method, we may find that the old method made calls several other "helper" functions, on the same or other collaborating objects. We simply copy all that code into the new method too, until the code in the new method "compiles".

At that point you've effectively made the new code independent of the old code, while preserving the original behavior. You could visualize this as "combing the spaghetti code". Instead of touching/using multiple functions, all the code involved gets executed from within one large function. You now have one string that's completely separate from the other still entangled "strings".

Separated string

Of course, this single large method won't be very good, and you should not leave it as it is. Having the code all in front of you, instead of scattered across several monstrously big classes, allows you to rethink its design. It gives you the option to come up with different "abstractions", as Stijn calls it. In my recent experiences with legacy code, it also enables you to come up with much simpler solutions.

Keep in mind that the original code had become very generic, by means of added parameter flags or method decoration. Since we copied all the code, it now serves only one use case - the one we're working on right now. Because of this, most likely we can simplify the code we copied to the new method and end up with something that's quite well maintainable too.

Conclusion

"Combing" legacy code by untangling the strings is a good way to improve it and take back control over it. In existing legacy code, you should stop (re)using existing methods, making them ever more generic. Instead, you create new methods, and copy code from existing ones, allowing you to simplify this copied code and end up with a manageable class.

If you're looking for a way to prevent yourself from writing legacy code, you should become more aware of that moment I talked about; the moment you modify an existing method to serve a new use case. Think twice before you do so, because adding up all these moments will result in an unmaintainable class.

PHP legacy code quality Comments

Exceptions and talking back to the user

Posted on by Matthias Noback

Exceptions - for exceptional situations?

From the Domain-Driven Design movement we've learned to go somewhat back to the roots of object-oriented design. Designing domain objects is all about offering meaningful behavior and insights through a carefully designed API. We know now that domain objects with setters for every attribute will allow for the objects to be in an inconsistent state. By removing setters and replacing them with methods which only modify the object's state in valid ways, we can protect an object from violating domain invariants.

This style of modeling domain aggregates (entities and value objects) results in methods that modify state, but only after checking if the change is allowed and makes sense, given the rules dictated by actual domain knowledge. In terms of code, these methods may make some simple assertions about the arguments passed to it: the number of things, the range of a number, the format of a string. When anything is wrong about a provided argument, the domain object simply throws an exception. If everything is okay, the changes will be accepted by modifying any attribute involved.

So exceptions in your (object-oriented) domain model are not merely meant to signal an exceptional situation. They can be used to prevent invalid or unsupported usage of an object. By offering well-named methods (with sensible parameters) for changing the object's state, and by being very precise about throwing exceptions when invalid use is imminent, you make your domain objects usable in only one way: the way that makes sense. This is the exact opposite of how your domain objects end up looking if you generate getters and setters for every attribute.

Using exceptions for validation

You might say that, given that the object protects itself from ending up in an in valid state, it basically validates itself. However, it won't be good for our application's usability if we'd use these domain-level exceptions as messages to the user. The thing is:

  • Exceptions get thrown ad hoc, whenever something threatens the consistency of the domain object. You can only catch one of these exceptions and turn it into an error message for the user. If the user tries to fix the issue by making a change, sending the form again, and they manage to get past this exception, there may be another exception just around the corner. This will be very frustrating for the user, as it will feel like trial-and-error.
public function processDelivery(int $quantity, string $batchNumber): void {
    if ($quantity <= 0) {
        /*
         * If the user triggers this exception, they have to resubmit,
         * which may trigger another exception further down the line...
         */
        throw new InvalidArgument(
            'Quantity should be more than 0.'
        );
    }

    if (empty($batchNumber) {
        throw new InvalidArgument(
            'This delivery requires a batch number.'
        );
    }

    // ...
}
  • Domain exceptions aren't always about validating the data the user provides. They often signal that something is about to happen that can't logically happen, like a state change that isn't allowed or conceptually possible. E.g. once an order has been cancelled, it shouldn't be possible to change its delivery address, because that makes no sense:
public function changeDeliveryAddress(...): void {
    if ($this->wasCancelled) {
        throw new InvalidState('You cannot change ...');
    }

    // ...
}
  • Exception messages may contain more information than you'd like to share with the user.
  • Validation errors often require internationalization (i18n). They need localization in the sense that numbers should be formatted according to the user's locale. Of course, they often need translation too. Exceptions aren't naturally usable for translation, because they contain special values hard-coded into their messages.
throw new InvalidArgument(
    'Product 21 has a stock level of 100, but this delivery has a quantity of 200.'
);

Translation needs a template message which will be translated, after which the variables it contains will be replaced by their real values.

So exceptions thrown to protect domain invariants are not validation messages all by themselves. They are there to prevent bad things from happening to your domain objects. They are not useful if what you want is talk back to the user. If you're looking for a dialogue with the user about what they're trying to achieve, you should be having it in a layer of the application that's closer to the user.

There are several options there:

  • You could make the user interface smarter and conveniently provide some validation errors while the user is still filling in a form, etc.
  • Once the user has submitted some data, you could validate it when you're inside the controller. If necessary, you could then update the view with validation errors.

Improving the UI to prevent backend exceptions

If everything looks good, it should not be necessary to rely on domain-level exceptions to warn the user about something they're doing wrong. The UI and the controller should be able to let the user safely reach their goal. Often, you'll need to make the UI just a little bit smarter, by providing it with some (backend) knowledge.

For example, if the UI has a delete button for deleting an order, pressing it will trigger a number of validation steps in the backend code to figure out whether or not the order can in fact be deleted. If you show a delete button to the user, and they click on it, wait for a few seconds, and only then find out that deleting is actually not allowed, they will be disappointed. You can prevent this disappointment, by doing the checks beforehand, when the page gets loaded. That way, you can tell the frontend whether or not to show the delete button in the first place.

Of course, the backend will still make the checks, but the chance that the user can't do what they want to do, will be quite small.

User-facing exceptions

In the project I'm currently working on, there is a sort of mixed-style exception in use: the so-called App_Exception. When such an exception bubbles up to the point where it would be rendered as a 500 Internal server error page, it will instead be converted to a user-friendly error message shown in the UI. It's often an application service or domain-level exception that's meant to reach the user - unlike all the other types of exceptions, I should add. These exceptions will be logged, but never shown to the user. App_Exception however has a translatable message, aiming to help the user figure out what's wrong.

I liked the underlying idea; sometimes you really want to talk back to the user, telling them that you can't continue with their request. If you can't easily improve the UI to not let the user make the - somehow - bad request in the first place, something like an App_Exception can be very convenient (it may not be the most intention-revealing name of course).

My wish list for improving this thing was:

  • The user-facing exception message itself should be translatable, to be as helpful to the user as possible.
  • The user-facing exception message should have contextual data, to be even more helpful (i.e. "You can't deliver 3 items, since only 2 were ordered.").
  • It should be possible for the exception's developer-facing message (the message that ends up in the logs), to be more detailed or simply just different than the user-facing message.
  • It should be easy to use custom, domain-specific exception type classes, instead of one generic class, like App_Exception.

This reminded me of how the Symfony security component deals with sensitive exceptions. These exceptions end up as authentication errors on the screen of the user. But at this point, there is an alternative message - not the original exception message - that gets shown to the user. Because these security-related exceptions usually contain sensitive information, useful for intruders, the exceptions have a special API (see the base class AuthenticationException).

These exception classes have the standard Exception constructor, but they also have extra methods:

/**
 * Message key to be used by the translation component.
 *
 * @return string
 */
public function getMessageKey()
{
    // ...  
}

/**
 * Message data to be used by the translation component.
 *
 * @return array
 */
public function getMessageData()
{
    // ...
}

That way, you could use the Symfony translator to show a message that tells the user what's wrong, but without exposing sensitive details:

$safeMessge = $translator->translate(
    $exception->getMessageKey(), 
    $exception->getMessageData()
);

Nice!

This was in the back of my head, and I wanted something like this to replace App_Exception - the one exception to rule them all. So I defined an interface with just these two methods, getMessageKey() and getMessageData(), and allowed any exception in the project to implement it. This of course solves the issue of not being tied to a specific exception class, so you can easily create your own custom type of exceptions (which I find very useful, in particular when in unit-testing).

Heresy

"But, you're a dogmatic, a purist! You wouldn't want the exceptions from your domain layer to implement an interface aimed at translating!" Agreed, translating is an infrastructure/top layer concern. However, this is so nice, so helpful, and so very convenient. Besides, we're not actually translating here - we're only making our objects ready for translation. It makes a lot of sense to me. Also, I'm not a dogmatic, nor a purist ;)

By the way, this new type of exception will end up in the logs, but should not require the immediate attention of the development team (i.e. developers shouldn't be alerted about them). However, they shouldn't be completely ignored too. If some of these exceptions get thrown a lot, this is a useful sign that something about the UI can be improved to actually prevent these exceptions and offer the user a nicer experience.

PHP exceptions validation DDD Comments

Mocking the network

Posted on by Matthias Noback

In this series, we've discussed several topics already. We talked about persistence and time, the filesystem and randomness. The conclusion for all these areas: whenever you want to "mock" these things, you may look for a solution at the level of programming tools used (use database or filesystem abstraction library, replace built-in PHP functions, etc.). But the better solution is always: add your own abstraction. Start by introducing your own interface (including custom return types), which describes exactly what you need. Then mock this interface freely in your application. But also provide an implementation for it, which uses "the real thing", and write an integration test for just this class.

The same really is true for the network. You don't want your unit tests to rely on a network connection, or a specific web service to be up and running. So, you override PHP's curl_exec() function. Or, if your code uses Guzzle, you inject its Mock Handler to by-pass part of its real-life behavior. The smarter solution again is to introduce your own interface, with its own implementation, and its own integration test. Then you can prove that this implementation is a faithful implementation of the interface (contract) you defined. And it allows you to mock at a much more meaningful level than just replacing a "real" HTTP response with a recorded one.

Though this solution would be quite far from traditional mocking I thought it would be interesting to write a bit more about it, since there's also a lot to say. It does require a proper example though. Let's say you're writing (and testing) a piece of financial calculation code where you're doing a bit of currency-conversion as well. You need a conversion rate, but your application doesn't know about actual rates, so you need to reach out to some web service that keeps track of these rates. Let's say, you make a call to "exchangerates.com":

# ...

$response = file_get_contents('https://exchangerates.com/?from=USD&to=EUR&date=2018-03-18')
$data = json_decode($response);
$exchangeRate = (float)$data->rate ?? 1;

# use the exchange rate for the actual calculation
# ...

Yes, this is horrible. Testing this code and "mocking" the network call is only one of our problems. We have to deal with broken connections and responses, and by the way, this code doesn't even take into account most of the other things that could go wrong. Code like this that connects with "the big bad world" requires a bigger safety net.

The first thing we should do is (as always) introduce an interface for fetching an exchange rate:

interface ExchangeRateService
{
    public function getFor(string $from, string $to, DateTimeImmutable $date): float
}

We could at least move all that ugly code and stick it in a class implementing this interface. Such is the merit of setting up a "facade", which "provides a simplified interface to a larger body of code". This is convenient, and it allows client code to use this interface for mocking. At the same time though, we're hiding the fact that we're making a network call, and that things can go wrong with that in numerous ways.

Implement an Anti-Corruption Layer

The first thing we can and should do is protect ourselves from the bad (data) model which the external service uses. We may have a beautiful model, with great encapsulation, and intention-revealing interfaces. If we'd have to follow the weird rules of the external service, our model risks being "corrupted".

That's what Domain-Driven Design's "Anti-Corruption Layer" (ACL - a bit of a confusing name) is meant for: we are encouraged to create our own models and use them as a layer in front of an external service's source of data. In our case, the interface we introduced was a rather simple one, one that doesn't allow for proper encapsulation. And because of the use of primitive types, there certainly isn't a place for a good and useful API. Due to a quirk in the external service I didn't mention yet, if one of the currencies is EUR, it always needs to be provided as the second ($to argument).

It'll be a perfect opportunity for an ACL. Instead of dealing with an exchange rate as a rather imprecise float type variable, we may want to define it as an integer and an explicit precision. And instead of working with DateTimeImmutable, we'd be better off modelling the date to be exactly what we need, and encode this knowledge in a class called ExchangeDate. Also, not every string is a currency, so we'd better define a class for it too.

We'd end up with an improve interface:

interface ExchangeRateService
{
    public function getFor(Currency $from, Currency $to, ExchangeDate $date): ExchangeRate
}

final class ExchangeRate
{
    public function __construct(
        Currency $from, 
        Currency $to, 
        ExchangeDate $date,
        int $rate,
        int $precision
    ) {
        // ...
    }
}

The implementation for the service could easily work around that quirk about the order of from/to currencies. It would require swapping the arguments, then calculating the inverse of the exchange rate (for the purpose of which we could easily add a modifier method - invert() to our ExchangeRate object):

public function getFor(Currency $from, Currency $to, ExchangeDate $date)
{
    if ($from->equals(Currency::EUR())) {
        return $this->fetchExchangeRate(
            $to,
            $from,
            $date
        )->invert()
    }

    return $this->fetchExchangeRate(
        $from,
        $to,
        $date
    );
}

private function fetchExchangeRate(
    Currency $from, 
    Currency $to, 
    ExchangeDate $date
): ExchangeRate
{
    // ...
}

Inverting the runtime dependency

Even though step 1 - adding an ACL - is a big improvement for our model and related services, the next best thing for our application would be if we wouldn't have to make the external request at all. We'd want to invert the runtime dependency on the remote service, and there's several things we can do to achieve that (if you're interested in the topic of autonomous services, check out my book).

Different situations will require different solutions, but in this case, the most "obvious" solution would be to somehow make sure that your application already knows the answer to the question it was going to ask to the remote service. You could accomplish this by making daily background calls to the exchange rate service, and keep a local list of exchange rates for any currency that your application supports.

The good thing is, if you have developed a nice ACL, you can keep using it, whether or not the data will be fetched at runtime, all has already been fetched in a background job.

Conclusion

This concludes the series about "mocking at architectural boundaries". In all cases, the advice is to introduce your own abstractions. When it comes to dealing with a network connection, instead of asking the question "How can I mock the network?", you should be considering how you could add a proper protective (model) layer around it, and ideally, how you can prevent making the network call at all.

PHP testing mocking hexagonal architecture ACL DDD Comments