Local and remote code coverage for Behat

Posted on by Matthias Noback

Why code coverage for Behat?

PHPUnit has built-in several options for generating code coverage data and reports. Behat doesn't. As Konstantin Kudryashov (@everzet) points out in an issue asking for code coverage options in Behat:

Code coverage is controversial idea and code coverage for StoryBDD framework is just nonsense. If you're doing code testing with StoryBDD - you're doing it wrong.

He's right I guess. The main issue is that StoryBDD isn't about code, so it doesn't make sense to calculate code coverage for it. Furthermore, the danger of collecting code coverage data and generating coverage reports is that people will start using it as a metric for code quality. And maybe they'll even set management targets based on coverage percentage. Anyway, that's not what this article is about...

In my Advanced Application Testing workshop I just wanted to collect code coverage data to show how different types of tests (system, acceptance, integration and unit) touch different parts of the code. And how this evolves over time, when we're replacing parts of the system under test, or switch test types to achieve shorter feedback loops, etc.

The main issue is that, when it comes to running our code, Behat does two things: it executes code in the same project, and/or (and this complicates the situation a bit) it remotely executes code when it's using Mink to talk to a web application running in another process.

This means that if you want to have Behat coverage, you'll need to do two things:

  1. Collect local code coverage data.
  2. Instruct the remote web application to collect code coverage data itself, then fetch it.

Behat extensions for code coverage

For the workshop, I created two Behat extensions that do exactly this:

  1. LocalCodeCoverageExtension
  2. RemoteCodeCoverageExtension

The second one makes use of an adapted version of the LiveCodeCoverage tool I published earlier.

You have to enable these extensions in your behat.yml file:

default:
    extensions:
        Behat\MinkExtension:
            # ...
        BehatLocalCodeCoverage\LocalCodeCoverageExtension:
            target_directory: '%paths.base%/var/coverage'
        BehatRemoteCodeCoverage\RemoteCodeCoverageExtension:
            target_directory: '%paths.base%/var/coverage'
    suites:
        acceptance:
            # ...
            local_coverage_enabled: true
        system:
            mink_session: default
            # ...
            remote_coverage_enabled: true

Local coverage doesn't require any changes to the production code, but remote coverage does: you need to run a tool called RemoteCodeCoverage, and let it wrap your application/kernel in your web application's front controller (e.g. index.php):

use LiveCodeCoverage\RemoteCodeCoverage;

$shutDownCodeCoverage = RemoteCodeCoverage::bootstrap(
    (bool)getenv('CODE_COVERAGE_ENABLED'),
    sys_get_temp_dir(),
    __DIR__ . '/../phpunit.xml.dist'
);

// Run your web application now...

// This will save and store collected coverage data:
$shutDownCodeCoverage();

From now on, a Behat run will generate a coverage file (.cov) in ./var/coverage for every suite that has coverage enabled (the name of the file is the name of the suite).

The arguments passed to RemoteCodeCoverage::bootstrap() allow for some fine-tuning of its behavior:

  1. Provide your own logic to determine if code coverage should be enabled in the first place (this example uses an environment variable for that). This is important for security reasons. It helps you make sure that the production server won't expose any collected coverage data.
  2. Provide your own directory for storing the coverage data files.
  3. Provide the path to your own phpunit.xml(.dist) file. This file is used for its code coverage filter configuration. You can exclude vendor/ code for example.

Combining coverage data from different tools and suites

If you also instruct PHPUnit to generate "PHP" coverage files in the same directory, you will end up with several .cov files for every test run, one for every suite/type of test.

vendor/bin/phpunit --testsuite unit --coverage-php var/coverage/unit.cov
vendor/bin/phpunit --testsuite integration --coverage-php var/coverage/integration.cov
vendor/bin/behat --suite acceptance
vendor/bin/behat --suite system

// these files will be created (and overwritten during subsequent runs):
var/coverage/unit.cov
var/coverage/integration.cov
var/coverage/acceptance.cov
var/coverage/system.cov

Finally, you can merge these files using phpcov:

phpcov merge --html=./var/coverage/html ./var/coverage

You'll get this nice HTML report and for every line it shows you which unit test or feature file "touched" this line:

Merged code coverage report

I hope these extensions prove to be useful; please let me know if they are. For now, the biggest downside is slowness - running code with XDebug enabled is already slow, but collecting code coverage data along the way is even slower. Still, the benefits may be big enough to justify this.

PHP testing Behat code coverage Comments

Call to conference organisers: pay your workshop instructors

Posted on by Matthias Noback

A little background: speakers don't get paid

Speakers like myself don't get paid for doing a talk at a tech conference. That's why I call this work "open source". People will get a video or audio recording of the talk, including separately viewable or downloadable slides for free. The idea is, a conference costs a lot of money to organise. It is quite expensive to fly in all those speakers. So there's no money to pay the speakers for all their work (for me personally it's about 80 hours of preparation, plus time spent travelling, usually half a day before and after the conference). Speakers get their travel costs reimbursed, they often get two nights at a hotel, and a ticket to the conference. Plus, they get advertising for their personal brand (increasing their reputation as an expert, a funny person, or just a person with more Google results for their name).

Workshops: also not paid

Ever since I realized that creating workshops is something I like a lot, I started submitting them to conferences as well. This, to me, is a whole different story. There's many hours of educational experience, preparation, and again travel going into this. And delivering a workshop always costs a lot more energy than a single talk does. Often you won't get paid for all that. Nevertheless, it earns the conference organisers a lot more money than a talk does.

Workshops are often planned within the days before the main conference. Ticket prices range from 200 to 800 euro. It often happens to me that I'm delivering my workshop in front of 20-25 people. This means that the conference organisers receive 4000 to 20.000 euros per day per workshop. Surely, there will be costs involved. But even after subtracting those, a workshop instructor will often generate thousands of euros in revenue.

To be fair

I don't think it would be fair to pay workshop instructors the entire amount either (after subtraction of the costs). Having a conference organising your workshop has value too:

  • They have reach: hundreds of people will know about your workshop.
  • They deal with all the administrative work (payments, registration, refunds, etc.).
  • They take care of the well-being of the attendees (parking directions, food, drinks, etc.).

Call to conference organisers

Still, I wanted to point out how wrong the situation is for tutorial/workshop instructors at many PHP conferences I know of. I want to ask you all, conference organisers: next time you organise workshop days for your attendees, make sure to pay your instructors.

Some useful suggestions, which I've learnt from conference organisers I spoke with:

  1. Pay instructors for their work day: €1000,- (plus the usual reimbursement of travel costs, and a hotel night). This isn't quite enough to cover preparation time, but it's a reasonable amount to me.
  2. Let instructors share in the revenue, after subtraction of the costs, e.g. give them 50%. This makes up for the day, and the required preparation. It will also make instructors happy workers.

In fact, Laracon EU is a conference where they do this. Shawn McCool, one of its organisers, said to me:

Paying people for their work is the right thing to do, for both ethics and sustainability.

I totally agree with him. Now, do the right thing and make that change!

Community Comments

Reducing call sites with dependency injection and context passing

Posted on by Matthias Noback

This article continues where "Unary call sites and intention-revealing interfaces" ended.

While reading David West's excellent book "Object Thinking", I stumbled across an interesting quote from David Parnas on the programming method that most of us use by default:

The easiest way to describe the programming method used in most projects today was given to me by a teacher who was explaining how he teaches programming. "Think like a computer," he said. He instructed his students to begin by thinking about what the computer had to do first and to write that down. They would then think about what the computer had to do next and continue in that way until they had described the last thing the computer would do... [...]

It may seem like a very logical thing to do. And it's what I've seen myself and many other programmers do: "How do we implement this feature?" "Well, first we need this piece of data. Then we need to have a little algorithm manipulating it. Then we need to save it somewhere. And then, and then, etc." The code that is the result of this approach is sequential, imperative in nature.

Parnas continues:

Any attempt to design these programs by thinking things through in the order that the computer will execute them leads to confusion and results in systems that nobody can understand completely.

Parnas, David Lorge. "Software Aspects of Strategic Defense Systems." American Scientist 73 (1985). pp. 432–440.

The book describes ways to counter this method using a different style of thinking called "object thinking". When I know more about it, I'll share it with you. For now, I just wanted to note that this "computer thinking" we do, leads to many issues that make the overall design of our application worse. I described one situation in a previous article about read models, where I realized that we often try to answer many different questions by querying one and the same model (which is also the write model). By splitting write from read, we end up with a more flexible design.

The same goes for introducing interfaces, to achieve dependency inversion. We do a little extra work, but we thereby allow ourselves to get rid of a bit of "computer think".

Singletons, service locators, registries...

In the past, we've invented some useful utilities to help us with our computer thinking. "We need to send an email for this feature." "Hmm, how do we get the mailer?" "I know! We get it from the service locator." So there we go:

$mailer = $this->container->get('mailer');

"What else do we need?" "The current Request object, so we can find out if the client has added a particular header." "Okay, let's add that the request to the service locator as well." "No, wait, the request is part of the context." "No problem!"

$request = sfContext::getInstance()->getRequest();

"Wait, the context hasn't been created yet? Let's tell the computer to do it for us."

$request = sfContext::createInstance()->getRequest();

"Wait, the context has sometimes been created and now creating another instance leads to subtle bugs and decreased performance?"

if (sfContext::hasInstance()) {
    $request = sfContext::getInstance()->getRequest();
} else {
    $request = sfContext::createInstance()->getRequest();
}

Okay, this is where we should stop the silliness. This story is over the top. But we do this kind of thing every day. We write code that will break. Not now, but once it's used in slightly different ways (in a not so far future).

We're lucky, since we can prevent this situation from happening. All the answers are there. They are old, and famous answers.

Dependency injection

One of the answers is: dependency injection. To prevent us from worrying about "how to get something", we have to assume we'll get it. In other words, when constructing an object, dependencies (like the mailer from the previous example), should be injected. No need to fetch it, when the code gets executed, it will be there:

final class SomeService
{
    public function __construct(Mailer $mailer)
    {
        $this->mailer = $mailer;
    }

    public function doSomething()
    {
        $request = ...
    }
}

Of course we need a little setup work outside SomeService to make sure the mailer gets injected. But this is completely out of sight.

Context passing

Another answer is: context passing (not sure if this is an official term or anything, just posing it here). Consider all the objects in your application to be eternal. For instance, that mailer service we used; it could run forever. We could let it send any number of emails by calling its send() method with different arguments. However, the current request is a completely different object. It's impossible to model it as some eternal service, since it's temporary in nature. It comes and goes. In fact, asking for the current request will give you different answers all the time.

If you need an object like the current request (or the session, the user ID, the tenant ID, etc.), don't inject it in the constructor of a service - pass it along as an argument when calling the method:

final class SomeService
{
    public function __construct(Mailer $mailer)
    {
        $this->mailer = $mailer;
    }

    public function doSomething(Request $request)
    {
        ...
    }
}

By the way, context passing doesn't necessarily mean that you should be passing a Context object. In several projects I've worked on, we introduced a Context object at some point, to hold things like the ID of the user who owns the current session, a tenant or customer ID, and miscellaneous things like the client's IP address, or the request time. This may seem very convenient, but such a Context object itself becomes very unfocused. We are tempted to add more and more to it. Maybe user settings, maybe entire entities (just so we don't need to make a call to the repository anymore). In conclusion, passing around a Context object is often the first step in separating dependencies from context, but there's always another step after that: passing only the values from the context that a given method needs.

Dependency injection & Context passing: effective means for reducing call sites

In projects that don't apply dependency injection and context passing (everywhere), you will find that there are issues with the number of call sites for certain methods. In particular: ServiceLocator::get() or things like Context::getRequest(). If we apply "computer think", we use these static methods to fetch all the things we need, instead of receiving them as constructor or method arguments. If we use "object thinking", we get rid of these static calls, and so we drastically reduce the number of call sites for them. This in turn allows us to prepare the project for the future. Because today's ServiceLocator is yesterday's façade, last week's Zend_Registry, and last month's sfContext. Instead of using whatever convenient utility your favorite framework uses for fetching things, just make sure you always inject your dependencies and pass along your context. If you always do this, you can easily migrate your project once the framework you use isn't maintained anymore (or when you just want to try out something better).

PHP legacy code quality call sites Comments