Refactoring the Cat API client - Part III

Matthias Noback

July 16, 2015

In the first and second part of this series we’ve been doing quite a bit of work to separate concerns that were once all combined into one function.

The major “players” in the field have been identified: there is an HttpClient and a Cache, used by different implementations of CatApi to form a testable, performing client of The Cat Api.

Representing data

We have been looking at behavior, and the overall structure of the code. But we didn’t yet look at the data that is being passed around. Currently everything is a string, including the return value of CatApi::getRandomImage(). When calling the method on an instance we are “guaranteed” to retrieve a string. I say “guaranteed” since PHP would allow anything to be returned, an object, a resource, an array, etc. Though in the case of RealCatApi::getRandomImage() we can be sure that it is a string, because we explicitly cast the return value to a string, we can’t be sure it will be useful to the caller of this function. It might be an empty string, or a string that doesn’t contain a URL, like 'I am not a URL'.

class RealCatApi implements CatAPi
{
    ...

    /**
     * @return string URL of a random image
     */
    public function getRandomImage()
    {
        try {
            $responseXml = ...;
        } catch (HttpRequestFailed $exception) {
            ...
        }

        $responseElement = new \SimpleXMLElement($responseXml);

        return (string) $responseElement->data->images[0]->image->url;
    }
}

To make our code more robust and trustworthy, we can do a better job at making sure we return a proper value.

One thing we could do is verify the post-conditions of our method:

$responseElement = new \SimpleXMLElement($responseXml);

$url = (string) $responseElement->data->images[0]->image->url;

if (filter_var($url, FILTER_VALIDATE_URL) === false) {
    throw new \RuntimeException('The Cat Api did not return a valid URL');
}

return $url;

Though correct, this would be pretty bad for readability. It would become worse if there are multiple functions which all need the same kind of post-condition validations. We’d want to have a way to reuse this validation logic. In any case, the return value is still a meaningless string. It would be nice if we could preserve the knowledge that it’s a URL. That way, any other part of the application that uses the return value of CatApi::getRandomImage() would be aware of the fact that it’s a URL and would never mistake it for, say, an email address.

A value object representing a URL

Instead of writing post-conditions for CatApi::getRandomImage() implementations, we could write pre-conditions for image URLs. How can we make sure that a value like an image URL can’t exist, except when it’s valid? By turning it into an object and preventing it from being constructed using something that’s not a valid URL:

class Url
{
    private $url;

    public function __construct($url)
    {
        if (!is_string($url)) {
            throw new \InvalidArgumentException('URL was expected to be a string');
        }

        if (filter_var($url, FILTER_VALIDATE_URL) === false) {
            throw new \RuntimeException('The provided URL is invalid');
        }

        $this->url = $url;
    }
}

This type of object is called a value object.

Creating it with invalid data is impossible:

new Url('I am not a valid URL');

// RuntimeException: "The provided URL is invalid"

So every instance of Url that you encounter can be trusted to represent a valid URL. We can now change the code of RealCatApi::getRandomImage() to return a Url instance:

$responseElement = new \SimpleXMLElement($responseXml);

$url = (string) $responseElement->data->images[0]->image->url;

return new Url($url);

Usually value objects have methods to create them from and convert them back to primitive types, in order to prepare them or reload them from persistence, or to reconstruct them based on different value objects. In this case we might end up with a fromString() factory method and a __toString() method. This paves the way for parallel construction methods (e.g. fromParts($scheme, $host, $path, ...)) and specialized “getters” (host(), isSecure(), etc.). Of course, you shouldn’t implement these methods before you actually need them.

class Url
{
    private $url;

    private function __construct($url)
    {
        $this->url = $url;
    }

    public static function fromString($url)
    {
        if (!is_string($url)) {
            ...
        }

        ...

        return new self($url);
    }

    public function __toString()
    {
        return $this->url;
    }
}

Finally, we need to modify the contract of getRandomImage() and make sure the default image URL will also be returned as a Url object:

class RealCatApi implements CatAPi
{
    ...

    /**
     * @return Url URL of a random image
     */
    public function getRandomImage()
    {
        try {
            $responseXml = ...;
        } catch (HttpRequestFailed $exception) {
            return Url::fromString('https://cdn.my-cool-website.com/default.jpg');
        }

        $responseElement = new \SimpleXMLElement($responseXml);

        return Url::fromString((string) $responseElement->data->images[0]->image->url);
    }
}

Of course, this change should also be reflected in the Cache interface and any class implementing it, like FileCache, where you should convert from and to Url objects:

class FileCache implements Cache
{
    ...

    public function put(Url $url)
    {
        file_put_contents($this->cacheFilePath, (string) $url);
    }

    public function get()
    {
         return Url::fromString(file_get_contents($this->cacheFilePath));
    }
}

Parsing the XML response

One last part of the code I’d like to improve, is this:

$responseElement = new \SimpleXMLElement($responseXml);

$url = (string) $responseElement->data->images[0]->image->url;

I personally don’t like SimpleXML, but that’s not the problem here. What’s really wrong is the fact that we’re assuming we get a valid XML response, containing a root element, which contains one <data> element, which contains at least one <images> element, which contains one <image> element, which contains one <url> element, the value of which is supposed to be a string (the image URL). At any point in this chain, an assumption may be wrong, which might cause a PHP error.

We want to take control of this situation and instead of making PHP throw errors at us, define our own exceptions which we can catch and bypass if we want. Again, we should hide all the knowledge about the exact element names and the hierarchy of the response XML inside an object. That object should also handle any exceptional things. The first step is to introduce a simple DTO (data transfer object) which represents an “image” response from the Cat API.

class Image
{
    private $url;

    public function __construct($url)
    {
        $this->url = $url;
    }

    /**
     * @return string
     */
    public function url()
    {
        return $this->url;
    }
}

As you can see, this DTO hides the fact that there is also are <data>, <images>, <image>, etc. elements in the original response. We are only interested in the URL, and we can make it accessible as a simple getters (url()).

Now the code in getRandomImage() could become:

$responseElement = new \SimpleXMLElement($responseXml);
$image = new Image((string) $responseElement->data->images[0]->image->url);

$url = $image->url();

This doesn’t help, since we’re still stuck with that fragile piece XML traversal code.

Instead of creating the Image DTO inline, we’d be better off delegating it to a factory which will be given the knowledge about the expected XML hierarchy.

class ImageFromXmlResponseFactory
{
    public function fromResponse($response)
    {
        $responseElement = new \SimpleXMLElement($response);

        $url = (string) $responseElement->data->images[0]->image->url;

        return new Image($url);
    }
}

We only need to make sure to inject an instance of ImageFromXmlResponseFactory into RealCatApi and then we can reduce the code in RealCatApi::getRandomImage() to:

$image = $this->imageFactory->fromResponse($responseXml);

$url = $image->url();

return Url::fromString($url);

Moving code around like this gives us the opportunity to make something better of it. Small classes are easier to test. Like mentioned before, the tests we have in place for our classes at this point in general only test the “happy path”. A lot of edge cases aren’t covered by it. For example, what happens when:

The response body is empty
It contains invalid XML
It contains valid XML with an unexpected structure
…

Moving the XML processing logic to a different class allows you to focus completely on behavior surrounding that particular subject. It even allows you to use a true TDD style of programming, where you define the situations (like the ones in the previous list of edge cases) and the expected results.

Conclusion

This concludes the “Refactoring the Cat API client” series. I hope you liked it. If you have other suggestions for refactoring, leave a comment at the bottom of this page.