Refactoring the Cat API client - Part I

Posted on by Matthias Noback

Some time ago I tweeted this:

It turned out, creating a video tutorial isn't working well for me. I really like writing, and speaking in public, but I'm not very happy about recording videos. I almost never watch videos myself as well, so... the video tutorial I was talking about won't be there. Sorry!

To make it up with you, what follows is a series of blog posts, covering the same material as I intended to cover in the first episodes of the tutorial.

So, for starters, here is a nice piece of code for you:

class CatApi
{
    public function getRandomImage()
    {
        if (!file_exists(__DIR__ . '/../../cache/random') 
            || time() - filemtime(__DIR__ . '/../../cache/random') > 3) {
            $responseXml = @file_get_contents(
                'http://thecatapi.com/api/images/get?format=xml&type=jpg'
            );
            if (!$responseXml) {
                // the cat API is down or something
                return 'http://cdn.my-cool-website.com/default.jpg';
            }

            $responseElement = new \SimpleXMLElement($responseXml);

            file_put_contents(
                __DIR__ . '/../../cache/random',
                (string)$responseElement->data->images[0]->image->url
            );

            return (string)$responseElement->data->images[0]->image->url;
        } else {
            return file_get_contents(__DIR__ . '/../../cache/random');
        }
    }
}

As you may guess, the getRandomImage() function returns a random image URL from The Cat API (it really exists!). When the function is called several times within 3 seconds, the same URL will be returned, in order to prevent slow response times as well as abusing The Cat API.

If this code was in a run-once-dispose-immediately script, I wouldn't mind. But if it ends up in an actual application running on a production server, I wouldn't be so happy about it.

Some of the problems I see here:

  1. Several concerns are being addressed in the same function, i.e. fetching a fresh random URL, and caching it.
  2. A lot of low-level details like file paths, a URL, a cache lifetime, XML element names, etc. are in plain sight, obstructing the view of what's actually going on.

These two combined are big problems, because, well, let's take a look at the "unit" tests for this:

class CatApiTest extends \PHPUnit_Framework_TestCase
{
    protected function tearDown()
    {
        @unlink(__DIR__ . '/../../cache/random');
    }

    /** @test */
    public function it_fetches_a_random_url_of_a_cat_gif()
    {
        $catApi = new CatApi();

        $url = $catApi->getRandomImage();

        $this->assertTrue(filter_var($url, FILTER_VALIDATE_URL) !== false);
    }

    /** @test */
    public function it_caches_a_random_cat_gif_url_for_3_seconds()
    {
        $catApi = new CatApi();

        $firstUrl = $catApi->getRandomImage();
        sleep(2);
        $secondUrl = $catApi->getRandomImage();
        sleep(2);
        $thirdUrl = $catApi->getRandomImage();

        $this->assertSame($firstUrl, $secondUrl);
        $this->assertNotSame($secondUrl, $thirdUrl);
    }
}

We need to have some actual sleep() statements in our test in order to test the cache! Also, we merely verify that the returned value is a valid URL (which doesn't ensure us that the URL originates from The Cat API at all).

Let's start making the situation a bit better.

Separating concerns - Caching vs the real thing

Looking at the code sample above, we notice that the concerns of "caching things" are interwoven with the concerns of "fetching a fresh thing". But the situation isn't as bad as we may think. Basically, the structure of the code is like this:

if (!isCachedRandomImageFresh()) {
    $url = fetchFreshRandomImage();

    putRandomImageInCache($url);

    return $url;
}

return cachedRandomImage();

Wherever something is lazily loaded, or freshly generated only every x seconds, you'll see this pattern in the code.

By separating these concerns we would be able to test "fetching a fresh thing" separately from "caching that thing". We wouldn't be distracted by details about caching when we're more interested in verifying the correctness of the code that actually fetches a random cat image.

Let's first move all the code related to caching to a new class: CachedCatApi, which has the same public interface:

class CachedCatApi
{
    public function getRandomImage()
    {
        $cacheFilePath = __DIR__ . '/../../cache/random';
        if (!file_exists($cacheFilePath) 
            || time() - filemtime($cacheFilePath) > 3) {

            // not in cache, so do the real thing
            $realCatApi = new CatApi();
            $url = $realCatApi->getRandomImage();

            file_put_contents($cacheFilePath, $url);

            return $url;
        }

        return file_get_contents($cacheFilePath);
    }
}

By the way, I took this opportunity to change some minor things, which are still nice improvements for code readability:

  1. I've de-duplicated all the mentions of __DIR__ . '/../../cache/random'.
  2. I've removed else - it's irrelevant because the if part always returns something.

The CatApi class now only contains the logic for calling The Cat API:

class CatApi
{
    public function getRandomImage()
    {
        $responseXml = @file_get_contents('http://thecatapi.com/api/images/get?format=xml&type=jpg');
        if (!$responseXml) {
            // the cat API is down or something
            return 'http://cdn.my-cool-website.com/default.jpg';
        }

        $responseElement = new \SimpleXMLElement($responseXml);

        return (string)$responseElement->data->images[0]->image->url;
    }
}

Dependency inversion: introducing an abstraction

At this point we could split our test case as well, because we now have two classes, but in fact: we would be testing the CatApi class twice, since CachedCatApi is still using it directly. We can make a big step forwards by introducing an abstraction for CatApi - something that is replaceable if we don't want to make actual HTTP calls, but only test the caching behavior of CachedCatApi. Since both classes share exactly the same public interface, it's easy to define an actual interface for them. Let's call that interface CatApi for now and rename the old CatApi class to RealCatApi.

interface CatApi
{
    /**
     * @return string
     */
    public function getRandomImage();
}

class RealCatApi implements CatApi
{
    ...
}

class CachedCatApi implements CatApi
{
    ...
}

I don't think these are proper names, but for now, it suffices.

Next we shouldn't allow CachedCatApi to instantiate instances of CatApi. Instead, we should provide it with an instance it can use to fetch a fresh random image. This is called dependency injection: we provide an object with everything it needs.

class CachedCatApi implements CatApi
{
    private $realCatApi;

    public function __construct(CatApi $realCatApi)
    {
        $this->realCatApi = $realCatApi;
    }

    public function getRandomImage()
    {
        ...

            $url = $this->realCatApi->getRandomImage();

        ...
    }
}

This allows us to create truly separate test cases for both classes:

class CachedCatApiTest extends \PHPUnit_Framework_TestCase
{
    protected function tearDown()
    {
        @unlink(__DIR__ . '/../../cache/random');
    }

    /** @test */
    public function it_caches_a_random_cat_gif_url_for_3_seconds()
    {
        $realCatApi = $this->getMock('RealCatApi');
        $realCatApi
            ->expects($this->any())
            ->will($this->returnValue(
                // the real API returns a random URl every time
                'http://cat-api/random-image/' . uniqid()
            );

        $cachedCatApi = new CachedCatApi($realCatApi);

        $firstUrl = $cachedCatApi->getRandomImage();
        sleep(2);
        $secondUrl = $cachedCatApi->getRandomImage();
        sleep(2);
        $thirdUrl = $cachedCatApi->getRandomImage();

        $this->assertSame($firstUrl, $secondUrl);
        $this->assertNotSame($secondUrl, $thirdUrl);
    }
}

class RealCatApiTest extends \PHPUnit_Framework_TestCase
{
    /** @test */
    public function it_fetches_a_random_url_of_a_cat_gif()
    {
        $catApi = new RealCatApi();

        $url = $catApi->getRandomImage();

        $this->assertTrue(filter_var($url, FILTER_VALIDATE_URL) !== false);
    }
}

Conclusion

What have we gained so far? We can now test the caching behavior separately from the "real behavior". This means they can evolve separately, maybe even in completely different directions, as long as the classes implementing these behaviors keep to their contract.

What remains to be done? Well, a lot actually. We still can't test RealCatApi without making actual HTTP calls (which makes the current test quite unreliable and undeserving of the name "unit test"). The same goes for CachedCatApi: it will try to write to the filesystem, something we don't want in a "unit" test since it's slow and it influences global state.

Read part II

PHP OOP refactoring