Combing legacy code string by string

Posted on by Matthias Noback

I find it very curious that legacy (PHP) code often has the following characteristics:

  1. Classes with the name of a central domain concept have grown too large.
  2. Methods in these classes have become very generic.

Classes grow too large

I think the following happened:

The original developers tried to capture the domain logic in these classes. They implemented it based on what they knew at the time. Other developers, who worked on the code later, had to implement new features, or modify domain logic, because, well, things change. Also, because we need more things.

For instance, when a developer comes in and has to modify some of the logic related to "sales orders", they will look for classes with that name in it, like the SalesOrder entity, the SalesOrders controller, service, manager, helper, table gateway, repository, etc. Before considering the option to add more classes, most developers will first consider adding a method to one of the existing classes. Especially if the code of this class hasn't been tested.

Since this is legacy code we're talking about, it's very likely that the code hasn't been tested (if it was tested, would it be legacy code?). So most likely they will end up modifying the existing class, instead of thinking about what they really need: an object that does exactly what they need.

This is how legacy classes end up being so very large. We add more and more to it (since the new functionality is "related" to the existing code after all). It explains why legacy code has the first characteristic I mentioned at the beginning of this post ("Classes with the name of a central domain concept have grown too large."). The second characteristic still deserves some explanation though ("Methods in these classes have become very generic.").

Methods become very generic

Once we decide to take an existing class and modify it, we first analyze the code that's in the class: the method, the properties; we've all learned to take the D.R.Y. ("Don't repeat yourself") principle into account, so we're scared of redoing anything that was already done. While scanning the existing code, we may find several of the ingredients we need for the task at hand. Well, the ingredients are often somewhat useful, but not entirely what we need.

This is a crucial moment, a moment we've often experienced, a moment we've always appreciated. However, it's a moment we should fear, a moment we should fight (pardon the dramatic tone). It's the moment we change a method to fit in with our current use case.

How is this usually achieved? Most often we use one of the following tactics:

  1. Add an extra parameter (a.k.a. a parameter flag), using which we can influence the behavior of the method. Of course we provide a sensible default parameter, for backwards compatibility:

    // The original method:
    public function findAllSalesOrders()
    {
        // some complicated SQL, *almost* what we need
    }
    
    // The modified method, with a parameter flag  
    public function findAllSalesOrders($excludeFinishedOrders = false)
    {
        // ...
    }
    
  2. Call another method first (a.k.a. decoration). This method may do some of the work, and maybe let the original method do its original work afterwards.

    // The original method:
    public function findAllSalesOrders()
    {
        // some complicated SQL, *almost* what we need
    }
    
    // The new method, which decorates the original one:
    public function findOpenSalesOrders()
    {
        // call the original method
        $salesOrders = $this->findAllSalesOrders();
    
        return array_filter($salesOrders, function(SalesOrder $order) {
            return $order->isOpen();
        };
    }
    

Sometimes we even combine the two, making one method more generic, and calling it from a more specific one:

// The original method:
public function findAllSalesOrders()
{
    // some complicated SQL, *almost* what we need
}

// The modified method:
public function findSpecificSalesOrders(array $statuses = [])
{
    // some complicated SQL, can do anything we need
}

// The new method, which decorates the modified original one
public function findOpenSalesOrders()
{
    return $this->findSpecificSalesOrders([SalesOrder::STATUS_OPEN]);
}

Over time, the original methods become more generic. That is, they can be used in many different scenarios. At the same time, the classes containing these methods keep growing. Every time we need a slightly different method, we add it, and use one of the tactics described. Slowly the class becomes too large to remain manageable (see also "Keep an eye on the churn; finding legacy code monsters").

If this goes on long enough, we end up with true spaghetti code. If you visualize the execution of the code as a string travelling through your code base, having all these specialized methods using generic, reusable ones, looks something like this:

Entangled strings

A very healthy way of untangling legacy code like this, is to take a step back. I like how Stijn Vannieuwenhuyse described this in a tweet:

He hints at using the Inline Method refactoring multiple times, to collapse the hierarchy of method calls. For example, given the findOpenSalesOrders() method described above, we don't call another method (findAllSalesOrders()), but we copy all the code, (including the complicated SQL) from this method into the new method. Preferably we move this method to a new class, since we want to use this opportunity to make the original class a bit smaller.

While inlining the findAllSalesOrders() method inside the new findOpenSalesOrders() method, we may find that the old method made calls several other "helper" functions, on the same or other collaborating objects. We simply copy all that code into the new method too, until the code in the new method "compiles".

At that point you've effectively made the new code independent of the old code, while preserving the original behavior. You could visualize this as "combing the spaghetti code". Instead of touching/using multiple functions, all the code involved gets executed from within one large function. You now have one string that's completely separate from the other still entangled "strings".

Separated string

Of course, this single large method won't be very good, and you should not leave it as it is. Having the code all in front of you, instead of scattered across several monstrously big classes, allows you to rethink its design. It gives you the option to come up with different "abstractions", as Stijn calls it. In my recent experiences with legacy code, it also enables you to come up with much simpler solutions.

Keep in mind that the original code had become very generic, by means of added parameter flags or method decoration. Since we copied all the code, it now serves only one use case - the one we're working on right now. Because of this, most likely we can simplify the code we copied to the new method and end up with something that's quite well maintainable too.

Conclusion

"Combing" legacy code by untangling the strings is a good way to improve it and take back control over it. In existing legacy code, you should stop (re)using existing methods, making them ever more generic. Instead, you create new methods, and copy code from existing ones, allowing you to simplify this copied code and end up with a manageable class.

If you're looking for a way to prevent yourself from writing legacy code, you should become more aware of that moment I talked about; the moment you modify an existing method to serve a new use case. Think twice before you do so, because adding up all these moments will result in an unmaintainable class.

PHP legacy code quality