Refactoring without tests should be fine

Refactoring without tests should be fine. Why is it not? When could it be safe?

From the cover of “Refactoring” by Martin Fowler:

Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which “too small to be worth doing”. However the cumulative effect of each of these transformations is quite significant.

Good design means it's easy-to-change

Software development seems to be about change: the business changes and we need to reflect those changes, so the requirements or specifications change, frameworks and libraries change, so we have to change our integrations with them, etc. Changing the code base accordingly is often quite painful, because we made it resistant to change in many ways.

Code that resists change

I find that not every developer notices the “pain level” of a change. As an example, I consider it very painful if I can’t rename a class, or change its namespace. One reason could be that some classes aren’t auto-loaded with Composer, but are still manually loaded with require statements. Another reason could be that the framework expects the class to have a certain name, be in a certain namespace, and so on. This may be something you personally don’t consider painful, since you can avert the pain by simply not considering to rename or move classes.

Do you have an exit strategy?

It’s an extremely common problem in legacy code bases: a new way of doing things was introduced before the team decided on a way to get the old thing out.

Famous examples are:

Introducing Doctrine ORM next to Propel
Introducing Symfony FrameworkBundle while still using Zend controllers
Introducing Twig for the new templates, while using Smarty for the old ones
Introducing a Makefile while the rest of the project still uses Phing

And so on… I’m sure you also have plenty examples to add here!

Successful refactoring projects - The Mikado Method

You’ve picked a good refactoring goal. You are prepared to stop the project at anytime. Now how to determine the steps that lead to the goal?

Bottom-up development

There is an interesting similarity between refactoring projects, and regular projects, where the goal is to add some new feature to the application. When working on a feature, I’m always happy to jump right in and think about what value objects, entities, controllers, etc. I need to build. Once I’ve written all that code and I’m ready to connect the dots, I often realize that I have created building blocks that I don’t even need, or that don’t offer a convenient API. This is the downside of what’s commonly called “bottom-up development”. Starting to build the low-level stuff, you can’t be certain if you’re contributing to the higher-level goal you have in mind.

Successful refactoring projects - Set the right goal

Refactoring is often mentioned in the context of working with legacy code. Maybe you like to define legacy code as code without tests, or code you don’t understand, or even as code you didn’t write. Very often, legacy code is code you just don’t like, whether you wrote it, or someone else did. Since the code was written the team has introduced new and better ways of doing things. Unfortunately, half of the code base still uses the old and deprecated way…

Successful refactoring projects - Prepare to stop at any time

Refactoring projects

A common case of refactoring-gone-wrong is when refactoring becomes a large project in a branch that can never be merged because the refactoring project is never completed. The refactoring project is considered a separate project, and soon starts to feel like “The Big Rewrite That Always Fails” from programming literature.

The work happens in a branch because people actually fear the change. They want to see it before they believe it, and review every single part of it before it can be merged. This process may take months. Meanwhile, other developers keep making changes to the main branch, so merging the refactoring branch is going to be a very tedious, if not dangerous thing to do. A task that, on its own, can cause the failure of the refactoring project itself.

Negative architecture, and assumptions about code

In “Negative Architecture”, Michael Feathers speaks about certain aspects of software architecture leading to some kind of negative architecture. Feathers mentions the IO Monad from Haskell (functional programming) as an example, but there are parallel examples in object-oriented programming. For example, by using layers and the dependency inversion principle you can “guarantee” that a certain class, in a certain layer (e.g. domain, application) won’t do any IO - no talking to a database, no HTTP requests to some remote service, etc.

Road to dependency injection

Statically fetching dependencies

I’ve worked with several code bases that were littered with calls to Zend_Registry::get(), sfContext::getInstance(), etc. to fetch a dependency when needed. I’m a little afraid to mention façades here, but they also belong in this list. The point of this article is not to bash a certain framework (they are all lovely), but to show how to get rid of these “centralized dependency managers” when you need to. The characteristics of these things are:

Combing legacy code string by string

I find it very curious that legacy (PHP) code often has the following characteristics:

Classes with the name of a central domain concept have grown too large.
Methods in these classes have become very generic.

Classes grow too large

I think the following happened:

The original developers tried to capture the domain logic in these classes. They implemented it based on what they knew at the time. Other developers, who worked on the code later, had to implement new features, or modify domain logic, because, well, things change. Also, because we need more things.

Reducing call sites with dependency injection and context passing

This article continues where Unary call sites and intention-revealing interfaces ended.

While reading David West’s excellent book “Object Thinking”, I stumbled across an interesting quote from David Parnas on the programming method that most of us use by default:

The easiest way to describe the programming method used in most projects today was given to me by a teacher who was explaining how he teaches programming. “Think like a computer,” he said. He instructed his students to begin by thinking about what the computer had to do first and to write that down. They would then think about what the computer had to do next and continue in that way until they had described the last thing the computer would do… […]

Unary call sites and intention-revealing interfaces

Call sites

One of the features I love most about my IDE is the button “Find Usages”. It is invaluable when improving a legacy code base. When used on a class it will show you where this class is used (as a parameter type, in an import statement, etc.). When used on a method, it will show you where this method gets called. Users of a method are often called “clients”, but when we use “Find Usages”, we might as well use the more generic term “call sites”.

Keep an eye on the churn; finding legacy code monsters

Setting the stage: Code complexity

Code complexity often gets measured by calculating the Cyclomatic Complexity per unit of code. The number can be calculated by taking all the branches of the code into consideration.

Code complexity is an indicator for several things:

How hard it is to understand a piece of code; a high number indicates many branches in the code. When reading the code, a programmer has to keep track of all those branches, in order to understand all the different ways in which the code can work.
How hard it is to test that piece of code; a high number indicates many branches in the code, and in order to fully test the piece of code, all those branches need to be covered separately.

In both cases, high code complexity is a really bad thing. So, in general, we always strive for low code complexity. Unfortunately, many projects that you’ll inherit (“legacy projects”), will contain code that has high code complexity, and no tests. A common hypothesis is that a high code complexity arises from a lack of tests. At the same time, it’s really hard to write tests for code with high complexity, so this is a situation that is really hard to get out.

Simple CQRS - reduce coupling, allow the model(s) to evolve

CQRS - not a complicated thing

CQRS has some reputation issues. Mainly, people will feel that it’s too complicated to apply in their current projects. It will often be considered over-engineering. I think CQRS is simply misunderstood, which is the reason many people will not choose it as a design technique. One of the common misconceptions is that CQRS always goes together with event sourcing, which is indeed more costly and risky to implement.

Refactoring the Cat API client - Part III

In the first and second part of this series we’ve been doing quite a bit of work to separate concerns that were once all combined into one function.

The major “players” in the field have been identified: there is an HttpClient and a Cache, used by different implementations of CatApi to form a testable, performing client of The Cat Api.

Representing data

We have been looking at behavior, and the overall structure of the code. But we didn’t yet look at the data that is being passed around. Currently everything is a string, including the return value of CatApi::getRandomImage(). When calling the method on an instance we are “guaranteed” to retrieve a string. I say “guaranteed” since PHP would allow anything to be returned, an object, a resource, an array, etc. Though in the case of RealCatApi::getRandomImage() we can be sure that it is a string, because we explicitly cast the return value to a string, we can’t be sure it will be useful to the caller of this function. It might be an empty string, or a string that doesn’t contain a URL, like 'I am not a URL'.

Refactoring the Cat API client - Part II

The world is not a safe thing to depend upon

When you’re running unit tests, you don’t want the world itself to be involved. Executing actual database queries, making actual HTTP requests, writing actual files, none of this is desirable: it will make your tests very slow as well as unpredictable. If the server you’re sending requests to is down, or responds in unexpected ways, your unit test will fail for the wrong reasons. A unit test should only fail if your code doesn’t do what it’s supposed to do.

Refactoring the Cat API client - Part I

Some time ago I tweeted this:

I didn't mention this yet, but I'm working on a series of videos about the subject matter of my Principles of Package Design book.
— Matthias Noback (@matthiasnoback) May 14, 2015

It turned out, creating a video tutorial isn’t working well for me. I really like writing, and speaking in public, but I’m not very happy about recording videos. I almost never watch videos myself as well, so… the video tutorial I was talking about won’t be there. Sorry!