Book review: Fifty quick ideas to improve your tests - Part 2

Posted on by Matthias Noback

This article is part 2 of my review of the book "Fifty quick ideas to improve your tests". I'll continue to share some of my personal highlights with you.

Replace multiple steps with a higher-level step

If a test executes multiple tasks in sequence that form a higher-level action, often the language and the concepts used in the test explain the mechanics of test execution rather than the purpose of the test, and in this case the entire block can often be replaced with a single higher-level concept.

When writing a unit test for some complicated object, you may be collecting some data first, then putting that data into value objects, maybe wrapping them in other value objects, before you can finally pass them to the constructor of the value object. At that point you may have to call several methods on the object before it's in the right state. The same goes for acceptance tests. You may need quite a number of steps before you get to the point where you can finally write a 'When' clause.

I find that often the steps leading up to the 'When' clause (or to the 'Act' part of a unit test), can be summarized as one thing. This is meant by the "single higher-level concept". So instead of enumerating everything that has happened to the system, you look for a single step that described what your starting point is. For instance, instead of:

Given the user logged in
  And they have created a purchase order
  And they added product A to it, with a quantity of 10
  And they placed the purchase order
 When the user cancels the purchase order
 Then ... 

You could summarize the 'Given' steps as follows:

Given a placed purchase order for product A, with a quantity of 10
 When the user cancels the purchase order
 Then ...

Once you start looking for ways to make things smaller and simpler, easier to follow and read, you'll find that many of the details you previously added to the smaller steps are completely irrelevant for the higher-level step. In our case, the product and the quantity may be completely irrelevant for the scenario that deals with cancelling the purchase order. So the final version may well be:

Given a placed purchase order
 When the user cancels the purchase order
 Then ...

When implementing the step definition for this single, simplified step, you still need to provide sensible default data. But it happens behind the scenes. This technique hides the details that are irrelevant, and only shows the relevant ones to the person reading the scenario. It will be a lot easier to understand what's being tested with this particular scenario.

I find that this approach works really well with unit tests too. One test case may show how a complicated object can be constructed, another test case won't just repeat all those steps, but will summarize it properly (like we did for the above scenario). This often requires the introduction of a private method in the test class, which wraps the individual steps. The method is the abstraction, its name the higher-level concept.

Specification, acceptance tests, living documentation

When creating your artifacts, remember the three roles they must serve at different times: now as a specification, soon as acceptance tests, and later as living documentation. Critique the artifact from each perspective. How well does it serve each distinct role? Is it over-optimised for one role to the detriment of others?

In workshops I often point out that testing is a crucial part of the development workflow. You need it to verify that the unit of code you've been working works as expected. To improve this perspective on writing tests, you could aim for not just verifying correctness of a unit, but specifying it. This implies some kind of test-first approach, where you specify expected behavior. The next level would be to consider the future. What happens when you're done writing this code? Could someone else still understand what's going on? Do you only have technical specifications, or also domain-level specifications?

For unit tests, the danger is that you'll be doing only technical verifications. You may be testing methods, arguments and return values instead of higher-level behaviors. It happens when you're just repeating the logic of the class inside its unit test. You find yourself looking into the box (white box testing), instead of treating the box as an object with proper boundaries.

The best thing you can do is to think hard about the method names of a unit test. They are your chance to describe the behaviors on a higher level. It's all we do really. We start with some higher level expected behavior, and implement it in terms of lower-level concrete code steps. A method is what we often use to hide the irrelevant information, so at a higher level you can have meaningful conversations, without drowning in details. That's why a name for a method is good if it doesn't tell you what happens inside of the method, but when it describes the higher-level term for what happens. E.g. a repository's save() method describes what it does, but not how it does it.

I find that once you have the proper level of abstraction in your specifications, the actual acceptance criteria will be present there too. And then, those specifications, besides serving as acceptance tests, can be considered living documentation. They are documentation in the sense that they document the acceptance criteria (any business rule that's involved), and they are living, because when something changes on either side, this will have an immediate effect on the other side. This keeps the documentation automatically up-to-date, and the code in sync with the documentation.

Checking whether a test performs its three roles well - being specification, acceptance test and living documentation - is like a test where the test itself is the subject under test.

Business-oriented versus technical-oriented testing

Both the technical and the business aspects are important, and they both need to be tested, but teams will often get a lot more value out of two separate sets of tests rather than one mixed-role test.

I think this is correct. But it also turns out that you can't make an easy distinction between these different roles. At least, trying to do so by separating the tests by type (unit, acceptance, integration, system) doesn't work well. There doesn't have to be as strict a distinction between these different types of test at all. At least not in the sense that one type is business-oriented, the other technology-oriented.

A [...] common way of confusing coverage and purpose is thinking that acceptance tests need to be executed at a service or API layer.

To avoid this pitfall, make the effort to consider an area of coverage separately from the purpose of a test. Then you’re free to combine them. For example, you can have business-oriented unit tests, or technical end-to-end checks.

When you have a layered architecture (as described in "Layers, ports & adapters - Part 2, Layers"), you'll have an application layer which doesn't directly depend on any real infrastructure (so no database, no UI). This means you can write your acceptance tests against this application layer (a.k.a. "API layer"). It results in fast acceptance tests, which naturally don't talk about URLs or HTML elements, or about "what's in the database", which is great. I think this idea originated from a 2015 article - The Forgotten Layer of the Test Automation Pyramid).

However, it's not the complete truth though. You can mix and match however you like. It's possible to write unit tests which are more like acceptance tests. You can write acceptance tests which are more like system tests and test against the actual running application (see for an approach that mixes both: "Introducing Modelling by Example"), including the UI, the database, even the web server if you like (preferably, I'd say). I realize now that I've been doing things like that, but was still teaching the more strict separation of unit and acceptance tests. I find that at least the unit tests for your domain code can be mainly business-oriented.

Another interesting test for tests

I found another interesting idea in the book. The authors encourage us to reflect on what we do when a test fails. They talk about a company that uses the following rule:

If people changed the test to make it pass, it was marked as bad.

I'm not sure if this approach is really feasible, but there is something intriguing about the rule. The idea is that the test specifies behavior, embodying acceptance criteria. If nothing has changed about those, it shouldn't need to be modified. In theory it's only the production code that implements the behavior that should be updated. A change about the test means that either the specification itself was incorrect (in which case that would have been the starting point of making the change, not the production code itself), or that the test was bad in the sense that it wasn't a good (enough) specification after all...

PHP book review BDD testing Comments

Book review: Fifty quick ideas to improve your tests - Part 1

Posted on by Matthias Noback

Review

After reading "Discovery - Explore behaviour using examples" by Gáspár Nagy and Seb Rose, I picked up another book, which I bought a long time ago: "Fifty Quick Ideas to Improve Your Tests" by Gojko Adzic, David Evans, Tom Roden and Nikola Korac. Like with so many books, I find there's often a "right" time for them. When I tried to read this book for the first time, I was totally not interested and soon stopped trying to read it. But ever since Julien Janvier asked me if I knew any good resources on how to write good acceptance test scenarios, I kept looking around for more valuable pointers, and so I revisited this book too. After all, one of the author's of this book - Gojko Adzic - also wrote "Bridging the communication gap - Specification by example and agile acceptance testing", which made a lasting impression on me. If I remember correctly, the latter doesn't have too much practical advice on writing goods tests (or scenarios), and it was my hope that "Fifty quick ideas" would.

First, a few comments on the book, before I'll highlight some parts. I thought it was quite an interesting book, covering several underrepresented areas of testing (including finding out what to test, and how to write good scenarios). The book has relevant suggestions for acceptance testing which are equally applicable to unit testing. I find this quite surprising, since testing books in general offer only suggestions for a specific type of test, leaving a developer/reader (including myself) with the idea that acceptance testing is very different from unit testing, and that it requires both a different approach and a different testing tool. This isn't true at all, and I like how the authors make a point of not making a distinction in this book.

The need to describe why

As a test author you may often feel like you're in a hurry. You've written some code, and now you need to quickly verify that the code you've written actually works. So you create a test file which exercises the production code. Green light proves that everything is okay. Most likely you've tested some methods, verified return values or calls to collaborating objects. Writing tests like that verifies that your code is executable, that it does what you expect from it, but it doesn't describe why the outcomes are the way they are.

In line with our habit of testing things by verifying outputs based on inputs, the PhpStorm IDE offers to generate test methods for selected methods of the Subject-Under-Test (SUT).

Generate test methods

I always cry a bit when I see this, because it implies two things:

  1. I've determined the API (and probably wrote most of the code already) before worrying about testing.
  2. I'm supposed to test my code method-by-method.
  3. A generated name following the template test{NameOfMethodOnSUT}() is fine.

At the risk of shaming you into adopting a test-first approach, please consider how you'll test the code before writing it. Also, aim for the lowest possible number of methods on a class. Make sure the remaining methods are conceptually related. The SOLID principles will give you plenty of advice on this topic.

Anyway, if we're talking about writing tests using some XUnit framework (e.g. JUnit, PHPUnit), don't auto-generate test methods. Instead:

Name the test class {NameOfConcept}Test, and add public methods which, combined, completely describe or specify the concept or thing. I find it very helpful to start these method names with "it", to refer to the thing I'm testing. But this is by no means a strict rule. What's important is that you'll end up with methods you can read out loud to anyone interested (not just other developers, but product owners, domain experts, etc.). You can easily check if you've been successful at it by running PHPUnit with the --testdox flag, which produces this human-readable description of your unit.

Rules and examples

There's much more to it than following this simple recipe though. As the book proposes, every scenario (or unit test case), should first describe some rule, or acceptance criterion. The steps of the test itself should then provide a clear example of the consequences of this rule.

Rules are generic, abstract. The examples are specific, concrete. For instance, a rule could be: "You can't schedule a meeting with someone who already has a meeting at the same time." An example of this rule could be: "Given Judy attends a meeting that's scheduled from 2PM until 4PM, when Mike schedules a 1-hour meeting at 3PM, then he won't be able to invite Judy for it." Rules are often known as "acceptance criteria". Hence, if you write a scenario for an acceptance test, you start the scenario by describing the rule, and you then show an example that demonstrates the rule in the context of the application you're building, e.g.

Scenario: You can't schedule a meeting with someone who already has a meeting at the same time.

  Given Judy attends a meeting that's scheduled from 2PM until 4PM
   When Mike schedules a 1-hour meeting at 3PM
   Then he won't be able to invite Judy for it

I find that very often the original developers of the code know the rule that's involved in a scenario, but don't write it down. They assume the reader will be able to extract the rule from the example by reading the scenario. This may work for some obvious scenarios, but in other cases it won't. It may be obvious to you now, but it won't be to someone else next year. So, as a developer, make sure to write and automate scenarios like the one above, but also document what you've learned about your application and its domain, by writing down all the rules that are involved.

The opposite problem may also occur by the way. It happens when the tests only specify the rules, and the examples just repeat the rules. For instance:

Scenario: You can't schedule a meeting with someone who already has a meeting at the same time.

  Given someone attends a meeting
   When someone else schedules a meeting at the same time
   Then they can't invite them to this meeting

Besides the fact that the automation code behind these steps will be horrible,

Unless examples are making things more concrete, they will mislead people into a false assumption of completeness, so they need to be restated.

Decoupling how something will be tested from what is being tested

A big issue with most "acceptance test suites" I've seen, is that they mix what will be tested with how it's tested. If you want to separate these two things in software, the answer is usually: abstraction. And it's missing if you're using built-in step definitions for navigating a website by clicking links, looking for HTML elements and analyzing their contents. The biggest issue is that these scenarios don't belong in acceptance tests. There's nothing about acceptance criteria in there. These scenarios are more like exploratory tests, the actions of which have been recorded and automated.

From the perspective of maximizing the quality of your tests, the issue isn't that running end-to-end tests exercising the actual UI of the application is slow and fragile. Even though this is a bad thing for the longer term, the issue today is that these automated scenarios break for many reasons that are unrelated to the acceptance criteria themselves. The need to update the scenario when a URL, the structure of the HTML of a page changes (or even when there's a change in the CSS or JavaScript that's used), is annoying. It slows developers down, and soon they will feel that the tests don't add much value, because they break more often than they are helpful.

Decoupling how something will be tested from what is being tested significantly reduces future test maintenance costs.

The true power of scenarios comes from being able to describe behavior in a technology-agnostic way. This allows us to work together with stakeholders and domain experts on coming up with good examples for the rules they apply, and the acceptance criteria they have defined for the application. It helps defining counter examples, and refining those rules. I've found that it even helps showing the incorrectness of certain rules, or rephrasing them using more appropriate terms.

How to write good scenarios? Given-When-Then

Writing good scenarios requires good writing. Unfortunately, writing itself often invokes a sense of self-awareness, which easily leads to getting stuck. I'm getting stuck writing this post as we speak, so there you go. Anyway, the general idea of writing a scenario is that you follow the template Given-When-Then. This should lead you in the right direction. However, it often leads people astray. You may end up asking yourself if a certain step is part of 'Given' or 'When'. You may write just one scenario which shows every aspect of a feature, and wonder if you need to write any more scenarios. You may be writing lots and lots of scenarios, each copy-pasted from the one before it, with just some minor variations. Anyway, it's clear that we all need some proper advice about writing those scenarios!

I think the following quotes from the book give some really helpful advice. The first one is about grammar:

A good trick that prevents most accidental misuses of Given-When-Then is to use past tense for 'Given' clauses, present tense for 'When' and future tense for 'Then'. This makes it clear that 'Given' statements are preconditions and parameters, and that 'Then' statements are postcondition and expections.

Besides using the present tense only for the 'When' clause, also limit the use of the active tense to the 'When' clause.

Make 'Given' and 'Then' passive - they should describe values rather than actions. Make sure 'When' is active - it should descibe the action under test.

You can have several 'Given's, or 'Then's. Keep them small and compact though, and prevent long lists of steps. Having many 'Given' steps will confuse the reader, who will wonder: which preconditions are essential? Are they somehow related? Having many 'Then' steps likely means that the scenario tries to test multiple things in one go. This requires a rewrite by splitting the scenario into two or more scenarios, each with a clear purpose. About that:

Each scenario should ideally only have one 'When' clause that clearly points to the purpose of the test.

Conclusion

There's more to write about the advice given by this nice "Fifty quick ideas" book. I'm saving that for another blog post.

PHP book review BDD testing Comments

Book review: Discovery - Explore behaviour using examples

Posted on by Matthias Noback

I've just finished reading "Discovery - Explore behaviour using examples" by Gáspár Nagy and Seb Rose. It's the first in a series of books about BDD (Behavior-Driven Development). The next parts are yet to be written/published. Part of the reason to pick up this book was that I'd seen it on Twitter (that alone would not be a sufficient reason of course). The biggest reason was that after delivering a testing and aggregate design workshop, I noticed that my acceptance test skills aren't what they should be. After several years of not working as a developer on a project for a client, I realized again that (a quote from the book):

You can write good software only for problems that you understand.

BDD book 1 - "Discovery" - revolves around this fact of the developer's life. How often did you produce a piece of code based on some vague issue in Jira, only to realize after showing it to stakeholders that almost everything about it was wrong? Often you don't know enough, so you'll mix in some of your own assumptions. And very often, you don't know how the thing you wrote is actually going to be used. In short, you don't know if what you created is adequate. You can write any amount of clean code, with lots of automated tests, and no observable bugs; if the software doesn't fit the intended use, it's useless.

You have to get out of the code, move away from your desk, and talk to the people who want you to build the software. Don't let them give you a requirements document and leave you alone afterwards. Don't lock yourself up with your project manager for a day-long sprint planning. Make sure you get together more often for a quick session to come up with good examples, of what kind of information will go into the system, what rules there are about processing the information, and what the expected outcomes are. Make sure you talk about realistic scenarios, and document all of the decisions you make.

Seems easy, but this, for many of us including myself, is far out of the comfort zone. I'm happy if I can do some coding alone. But eventually, you want clients and users to be happy about what you produce, and this only happens when what you build is useful to them. It should match their way of living, their business processes, their domain expertise.

BDD: not about tools, not about testing

It's my habit to mock development subcultures, like DDD, but also BDD. Even though what they bring to the table is very useful, there are always blanket statements that people will start using, to identify themselves with that subculture.

The process of learning a culture— enculturation— is partly explicit but mostly implicit. The explicit part can be put into books and taught in seminars or classrooms. Most of culture is acquired by a process of absorption— by living and practicing the culture with those who already share it.
West, David. Object Thinking (Developer Reference) (Kindle Locations 244-246). Pearson Education. Kindle Edition.

The same is actually true for myself, quoting David West and all.

I was actually anticipating to read BDD subculture's favorite thing to say: "BDD is not about the tools". So I was happy to see it here too:

One typical mistake is to see BDD as a tool-thing. BDD is primarily about collaboration and domain discovery; any “BDD tool” can be only useful in supporting this process. You have to start by investing in collaborative discussions and the creation of a shared vocabulary. Just going after automation (using Cucumber or SpecFlow) does not work.

No hard feelings of course, because this is simply true. I've seen it happen too. PHP's Cucumber is called Behat. Many projects that use Behat end up having many, many scenarios. That may sound like a good thing. Although these scenarios are all written in plain English, which may give the illusion that they are business-oriented, they don't do anything a simple PHPUnit test case couldn't do.

The power of Gherkin (the syntax for these scenarios) lies in being able to capture usage examples, together with business rules. The only way to come up with good examples and to find out the business rules involved, is to collaborate with the people who actually know about them. That's why BDD isn't about the tools. Just by writing test scenarios in your natural language, you won't be doing BDD.

Examples, rules, and scenarios

I've used several words whose meaning may not be self-evident. The way I understood it from the "Discovery" book:

Examples are descriptions of how a user would actually use a specific part of the application. They are concrete; they use meaningful data. They describe different stages ("steps") of the user interacting with the application.

Rules are policies; they describe possible paths, things that are allowed, invariants that should be protected, etc.

Examples are illustrations of those rules.

After you've determined usage examples, meaningful data, and business rules (in a requirements workshop - read more about this in the book), you can start writing scenarios. If you're using Behat or something alike, a scenario has the following template:

Scenario: [description of the rule]
  Given ...
  When ...
  Then ...

The famous Given-When-Then way of writing scenario's reflects the Arrange-Act-Assert setup for unit tests you may already be familiar with. The authors call the act of rewriting examples and rules as scenarios "formulation". The way you formulate a scenario influences the way you can later write code "behind" these steps. That's why the authors write that

Formulation is usually undertaken by a pair, often a developer and a tester.

I think this is an important quote. Often I've heard people say that Behat is not for them, because the stakeholders are not at all interested in the scenarios they write. So why write them? First of all, you need to distinguish between:

  1. Collecting requirements in the form of examples and rules.
  2. Writing and automating scenarios.

Activity 1 really needs good collaboration with stakeholders. But activity 2 is something you can do on your own (preferably together, to end up with something better). My advice has always been: if you want to make a culture change, just make your work a good example (no pun intended). The authors agree:

You could try following a BDD approach without business involvement, as a first step – you’ll get all the benefits outlined above and, in time, your business colleagues will come to appreciate the value of expressing the acceptance criteria as plain English scenarios.

Formulation

The act of formulating scenarios is something you need to do as a team. One developer can do it, but they should be reviewed. There's a lot to it; some scenarios are better than others. Although the book does contain some suggestions about writing good scenarios (e.g. don't illustrate multiple rules in one scenario, write interesting scenarios, not lots of boring ones), the authors announced a second book about the topic of formulation.

Living documentation

If besides writing unit tests for your code units, you also formulate scenarios containing realistic examples describing actual business rules, and then you also execute these scenarios by means of a test automation tool like Behat, you'll end up with the most amazing thing: living documentation (the best reference for this topic is always Cyrille Martraire's book Living documentation). The scenarios document the expected behavior of the application. It will automatically be "living" documentation, since it can never become outdated. If it does, executing the scenarios will simply fail.

Align tasks to scenarios

One "trick" you can already start applying, is to align tasks by scenarios (even if you don't write scenarios!). The opposite would be aligning them by technical implementation detail. Looking back on some of my past projects, I know that we'd been working on it for months, without being able to show even one working, useful scenario to the stakeholders. I remember myself reporting things like "We're still working on a good foundation." (i.e. we were working on the messaging infrastructure, the database design, etc.).

The team will focus on the expected behaviour during the implementation. This can help avoid “gold-plating” or implementing infrastructures that are not needed. Starting from the scenario helps follow an outside-in development approach. You can get feedback about the story implementation earlier. Imagine you have a half-done story, where all the database tables are created, but nothing is visible for the users. Now compare it with a half-done story, where 3 of the 7 scenarios are already implemented end-to-end. The latter can be shown to the product owner to get feedback on what has been implemented so far.

Conclusion

In conclusion - go ahead and read "The BDD Books - Discovery"!

PHP book review BDD testing Behat Comments