Some time ago I tweeted this:
I didn't mention this yet, but I'm working on a series of videos about the subject matter of my Principles of Package Design book.
— Matthias Noback (@matthiasnoback) May 14, 2015
It turned out, creating a video tutorial isn't working well for me. I really like writing, and speaking in public, but I'm not very happy about recording videos. I almost never watch videos myself as well, so... the video tutorial I was talking about won't be there. Sorry!
To make it up with you, what follows is a series of blog posts, covering the same material as I intended to cover in the first episodes of the tutorial.
So, for starters, here is a nice piece of code for you:
class CatApi
{
public function getRandomImage()
{
if (!file_exists(__DIR__ . '/../../cache/random')
|| time() - filemtime(__DIR__ . '/../../cache/random') > 3) {
$responseXml = @file_get_contents(
'https://thecatapi.com/api/images/get?format=xml&type=jpg'
);
if (!$responseXml) {
// the cat API is down or something
return 'https://cdn.my-cool-website.com/default.jpg';
}
$responseElement = new \SimpleXMLElement($responseXml);
file_put_contents(
__DIR__ . '/../../cache/random',
(string)$responseElement->data->images[0]->image->url
);
return (string)$responseElement->data->images[0]->image->url;
} else {
return file_get_contents(__DIR__ . '/../../cache/random');
}
}
}
As you may guess, the getRandomImage()
function returns a random image URL from The Cat API (it really exists!). When the function is called several times within 3 seconds, the same URL will be returned, in order to prevent slow response times as well as abusing The Cat API.
If this code was in a run-once-dispose-immediately script, I wouldn't mind. But if it ends up in an actual application running on a production server, I wouldn't be so happy about it.
Some of the problems I see here:
- Several concerns are being addressed in the same function, i.e. fetching a fresh random URL, and caching it.
- A lot of low-level details like file paths, a URL, a cache lifetime, XML element names, etc. are in plain sight, obstructing the view of what's actually going on.
These two combined are big problems, because, well, let's take a look at the "unit" tests for this:
class CatApiTest extends \PHPUnit_Framework_TestCase
{
protected function tearDown()
{
@unlink(__DIR__ . '/../../cache/random');
}
/** @test */
public function it_fetches_a_random_url_of_a_cat_gif()
{
$catApi = new CatApi();
$url = $catApi->getRandomImage();
$this->assertTrue(filter_var($url, FILTER_VALIDATE_URL) !== false);
}
/** @test */
public function it_caches_a_random_cat_gif_url_for_3_seconds()
{
$catApi = new CatApi();
$firstUrl = $catApi->getRandomImage();
sleep(2);
$secondUrl = $catApi->getRandomImage();
sleep(2);
$thirdUrl = $catApi->getRandomImage();
$this->assertSame($firstUrl, $secondUrl);
$this->assertNotSame($secondUrl, $thirdUrl);
}
}
We need to have some actual sleep()
statements in our test in order to test the cache!
Also, we merely verify that the returned value is a valid URL (which doesn't ensure us that the URL originates from The Cat API at all).
Let's start making the situation a bit better.
Separating concerns - Caching vs the real thing
Looking at the code sample above, we notice that the concerns of "caching things" are interwoven with the concerns of "fetching a fresh thing". But the situation isn't as bad as we may think. Basically, the structure of the code is like this:
if (!isCachedRandomImageFresh()) {
$url = fetchFreshRandomImage();
putRandomImageInCache($url);
return $url;
}
return cachedRandomImage();
Wherever something is lazily loaded, or freshly generated only every x seconds, you'll see this pattern in the code.
By separating these concerns we would be able to test "fetching a fresh thing" separately from "caching that thing". We wouldn't be distracted by details about caching when we're more interested in verifying the correctness of the code that actually fetches a random cat image.
Let's first move all the code related to caching to a new class: CachedCatApi
, which has the same public interface:
class CachedCatApi
{
public function getRandomImage()
{
$cacheFilePath = __DIR__ . '/../../cache/random';
if (!file_exists($cacheFilePath)
|| time() - filemtime($cacheFilePath) > 3) {
// not in cache, so do the real thing
$realCatApi = new CatApi();
$url = $realCatApi->getRandomImage();
file_put_contents($cacheFilePath, $url);
return $url;
}
return file_get_contents($cacheFilePath);
}
}
By the way, I took this opportunity to change some minor things, which are still nice improvements for code readability:
- I've de-duplicated all the mentions of
__DIR__ . '/../../cache/random'
. - I've removed
else
- it's irrelevant because theif
part always returns something.
The CatApi
class now only contains the logic for calling The Cat API:
class CatApi
{
public function getRandomImage()
{
$responseXml = @file_get_contents('https://thecatapi.com/api/images/get?format=xml&type=jpg');
if (!$responseXml) {
// the cat API is down or something
return 'https://cdn.my-cool-website.com/default.jpg';
}
$responseElement = new \SimpleXMLElement($responseXml);
return (string)$responseElement->data->images[0]->image->url;
}
}
Dependency inversion: introducing an abstraction
At this point we could split our test case as well, because we now have two classes, but in fact: we would be testing the CatApi
class twice, since CachedCatApi
is still using it directly. We can make a big step forwards by introducing an abstraction for CatApi
- something that is replaceable if we don't want to make actual HTTP calls, but only test the caching behavior of CachedCatApi
. Since both classes share exactly the same public interface, it's easy to define an actual interface for them. Let's call that interface CatApi
for now and rename the old CatApi
class to RealCatApi
.
interface CatApi
{
/**
* @return string
*/
public function getRandomImage();
}
class RealCatApi implements CatApi
{
...
}
class CachedCatApi implements CatApi
{
...
}
I don't think these are proper names, but for now, it suffices.
Next we shouldn't allow CachedCatApi
to instantiate instances of CatApi
. Instead, we should provide it with an instance it can use to fetch a fresh random image. This is called dependency injection: we provide an object with everything it needs.
class CachedCatApi implements CatApi
{
private $realCatApi;
public function __construct(CatApi $realCatApi)
{
$this->realCatApi = $realCatApi;
}
public function getRandomImage()
{
...
$url = $this->realCatApi->getRandomImage();
...
}
}
This allows us to create truly separate test cases for both classes:
class CachedCatApiTest extends \PHPUnit_Framework_TestCase
{
protected function tearDown()
{
@unlink(__DIR__ . '/../../cache/random');
}
/** @test */
public function it_caches_a_random_cat_gif_url_for_3_seconds()
{
$realCatApi = $this->getMock('RealCatApi');
$realCatApi
->expects($this->any())
->will($this->returnValue(
// the real API returns a random URl every time
'https://cat-api/random-image/' . uniqid()
);
$cachedCatApi = new CachedCatApi($realCatApi);
$firstUrl = $cachedCatApi->getRandomImage();
sleep(2);
$secondUrl = $cachedCatApi->getRandomImage();
sleep(2);
$thirdUrl = $cachedCatApi->getRandomImage();
$this->assertSame($firstUrl, $secondUrl);
$this->assertNotSame($secondUrl, $thirdUrl);
}
}
class RealCatApiTest extends \PHPUnit_Framework_TestCase
{
/** @test */
public function it_fetches_a_random_url_of_a_cat_gif()
{
$catApi = new RealCatApi();
$url = $catApi->getRandomImage();
$this->assertTrue(filter_var($url, FILTER_VALIDATE_URL) !== false);
}
}
Conclusion
What have we gained so far? We can now test the caching behavior separately from the "real behavior". This means they can evolve separately, maybe even in completely different directions, as long as the classes implementing these behaviors keep to their contract.
What remains to be done? Well, a lot actually. We still can't test RealCatApi
without making actual HTTP calls (which makes the current test quite unreliable and undeserving of the name "unit test"). The same goes for CachedCatApi
: it will try to write to the filesystem, something we don't want in a "unit" test since it's slow and it influences global state.
After first refactoring step you broke logic of code. You started cache url even if main api is down which is not supposed in original code. It is minor part at first glance. But make refactoring harder if you try preserve original behaviour. And in many cases such logic modification could be not acceptable.
Yeah, I kind of agree here. I was a bit surprised by the order in which things were done here. I suppose that is because I like to stay in the green as much as possible, specially when re-factoring.
The concepts behind the re-factor exercise are sound though, and makes it a very good read :)
Hi Matthias,
Won't your test fail every time or am I missing something ?
Here your mock is designed to send everytime the same URL 'http://cat-api/random-image'. But your test asserts that the second call will have to be different from the first call on the mock.
By the way, I do really like your blog ;-)
Well spot! Yes, it will fail. I've modified the example to introduce actual randomness to the URL returned by the stub. So now it will actually test the caching mechanism.
> It turned out, creating a video tutorial isn't working well for me. I really like writing, and speaking in public, but I'm not very happy about recording videos.
I am the same way.
As a consumer: reading works better (faster) for me anyway.
:)
Very nice! I love refactoring article series.
In last example, CachedCatApi is missing RealCatApi passed in costructor.
Fixed it! Thanks.
Great!