Preparing a Leanpub book for print-on-demand

Posted on by Matthias Noback

During the last few days I've been finishing my latest book, Microservices for everyone. I've used the Leanpub self-publishing platform again. I wrote about the process before and still like it very much. Every now and then Leanpub adds a useful feature to their platform too, one of which is an export functionality for downloading a print-ready PDF.

Let Leanpub generate a print-ready PDF for you

Previously I had to cut up the normal book PDF manually, so this tool saved me a lot of work. Though it's a relatively smart tool, the resulting PDF isn't completely print-ready for all circumstances (to be honest, that would be a bit too much to ask from any tool of course!). For example, I wanted to use this PDF file to publish the book using Amazon's print-on-demand self-publishing service CreateSpace, but I also wanted to order some copies at a local print shop (preferably using the same source files). In this post I'd like to share some of the details of making the print-ready PDF even more print-ready, for whomever may be interested in this.

Preparing the inside of the book

I found that several things needed to be fixed before all layout issues were gone:

  • Some lines which contained inline, pre-formatted code, like AVeryLongClassName, would not auto-wrap to the next line. Where this is the case, it triggers a warning in Createspace's automatic review process: the margin becomes too small for print. I fixed these issues by cutting these long strings in multiple parts, adding soft hyphens (\-) in between them.
  • Some images appeared to be too large. This was because Leanpub shows all images with a 100% width. Vertically oriented images will appear to be much larger than horizontally oriented ones. I added some whitespace in the images source file to force a "horizontal" rendering, but I later found out that you can also specify image positioning options, like width, float, etc.
  • Some images had a resolution that's too low for printing. When I realized, I started adding images and illustrations with higher resolutions than required. Unfortunately I had to redraw some of the illustrations manually in order to get a higher resolution version... Something to keep in mind from the beginning of the writing process!

The result of Leanpub's print-ready PDF export is a PDF containing colored code snippets and images. In order to turn it into a grayscale PDF document, I googled a bit and found a good solution. I now use Ghostscript to do the PDF conversion, using the following options:

gs  \
    -o /print/book.pdf \
    -sDEVICE=pdfwrite \
    -dHaveTransparency=false \
    -dProcessColorModel=/DeviceGray \
    -dEmbedAllFonts=true \
    -dSubsetFonts=false \
    -sColorConversionStrategy=Gray \
    -r300 \
    /preprint/book.pdf

This takes the book.pdf document from the /preprint directory, removes transparency, includes all used fonts, converts it to grayscale, and stores the images with a 300DPI resolution (which is excellent for print). It then saves the resulting PDF file in /print.

Preparing the cover

I designed the cover image using Gimp. The size and layout of the cover image are based on the number of pages, the thickness of the paper I wanted to use, the size of the PDF ("Technical", i.e. 7 x 9.1 inches) and the cut margin for the book printer. I put all this information in one spreadsheet (allowing me to use constants, variables, and simple derivations):

The book's dimensions in one spreadsheet

Using the information in this sheet I could then create an image file of the right size, and put visual guidelines at the right places:

Guidelines for the cover image

I always miss Photoshop when I'm working with Gimp. It can do most of what I want; except store CMYK images... That's very unfortunate and frustrating. I've been trying to overcome this issue in various ways (upload an RGB image to a website to let it be converted to CMYK, use Ghostscript, etc.). The final and automated solution came from Imagemagick's convert tool. The only problem was that you need to feed it color profiles. I have absolutely no clue what these are, but I downloaded some from the Adobe website and was able to use them in the following command:

convert /preprint/cover-rgb.png \
    +profile icm \
    -profile RGB.icc \
    -profile CMYK.icc \
    /print/cover-cmyk.pdf

The options mean: remove any color profile in use, then convert from RGB profile to CMYK profile.

The conversion process more or less keeps the look and feel of the original RGB/screen-based cover image intact. I'm curious what a real print looks like. When CreateSpace's review process is finished, I'll be sure to order a sample copy for one last proof-reading session.

Conclusion

I don't know when the final version of the book will be released yet. When I do, I'll blog about it here.

Book Leanpub Markdown CreateSpace Comments

Layers, ports & adapters - Part 3, Ports & Adapters

Posted on by Matthias Noback

In the previous article we discussed a sensible layer system, consisting of three layers:

  • Domain
  • Application
  • Infrastructure

Infrastructure

The infrastructure layer, containing everything that connects the application's use cases to "the world outside" (like users, hardware, other applications), can become quite large. As I already remarked, a lot of our software consists of infrastructure code, since that's the realm of things complicated and prone to break. Infrastructure code connects our precious clean code to:

  • The filesystem
  • The network
  • Users
  • The ORM
  • The web framework
  • Third party web APIs
  • ...

Ports

The layering system already offers a very useful way of separating concerns. But we can improve the situation by further analyzing the different ways in which the application is connected to the world. Alistair Cockburn calls these connection points the "ports" of an application in his article "Hexagonal architecture". A port is an abstract thing, it will not have any representation in the code base (except as a namespace/directory, see below). It can be something like:

  • UserInterface
  • API
  • TestRunner
  • Persistence
  • Notifications

In other words: there is a port for every way in which the use cases of an application can be invoked (through the UserInterface, through an API, through a TestRunner, etc.) as well as for all the ways in which data leaves the application (to be persisted, to notify other systems, etc.). Cockburn calls these primary and secondary ports. I often use the words input and output ports.

What exactly a port is and isn’t is largely a matter of taste. At the one extreme, every use case could be given its own port, producing hundreds of ports for many applications.

— Alistair Cockburn

Adapters

For each of these abstract ports we need some code to make the connection really work. We need code for dealing with HTTP messages to allow users to talk to our application through the web. We need code for talking with a database (possibly speaking SQL while doing so), in order for our data to be stored in a persistent way. The code to make each port actually work is called "adapter code". We write at least one adapter for every port of our application.

Adapters, which are very concrete and contain low-level code, are by definition decoupled from their ports, which are very abstract, and in essence just concepts. Since adapter code is code related to connecting an application to the world outside, adapter code is infrastructure code and should therefore reside in the infrastructure layer. And this is where ports & adapters and layered architecture play well together.

If you remember the dependency rule from my previous article, you know that code in each layer can only depend on code in the same layer or in deeper layers. Of course the application layer can use code from the infrastructure layer at runtime, since it gets everything injected as constructor arguments. However, the classes themselves will only depend on things more abstract, i.e. interfaces defined in their own layer or a deeper one. This is what applying the dependency inversion principle entails.

When you apply the principle everywhere, you can now write alternative adapters for your application's ports. You could run an experiment with a Mongo adapter side by side with a MySQL adapter. Also, you can make the tests that exercise application layer code a lot faster by replacing the real adapter with something faster (for example, an adapter that doesn't make network or filesystem calls, but simply stores things in memory).

Directory structure

Knowing which ports and adapters your application has or should have, I recommend reflecting them in the project's directory/namespace structure as well:

src/
    <BoundedContext>/
        Domain/
            Model/
        Application/
        Infrastructure/
            <Port>/
                <Adapter>/
                <Adapter>/
                ...
            <Port>/
                <Adapter>/
                <Adapter>/
                ...
            ...
    <BoundedContext>/
        ...

Testing

Having specialized adapters for running tests is the main reason why Cockburn proposed the ports & adapters architectural style in the first place. Having a ports & adapters/hexagonal architecture increases your application's testability in general.

At the same time, when we start replacing real dependencies with fake ones, we should not forget to test the real thing. This kind of test is what Freeman and Pryce call an integration test. It thoroughly tests one adapter. This means it tests infrastructure code, limited to one port. While doing so, it actually uses and calls as many "real" things as possible, i.e. calls a real external web API, it creates real files, and it uses a real database (not a faster SQLite replacement one, but the real deal - how would you know the persistence adapter for MySQL works if you use SQLite instead?).

Integrating Bounded Contexts

Now, for the Domain-Driven Design fans: when integrating bounded contexts, I find that it makes sense to designate a port for each context integration point too. You can read a full example usng a REST API in chapter 13, "Integrating Bounded Contexts" of Vaughn Vernon's book "Implementing Domain-Driven Design". The summary is: there's the Identity & Access, which keeps track of active user accounts and assigned roles, and there is a Collaboration context which distinguishes different types of collaborators: authors, creators, moderators, etc. To remain consistent with Identity & Access, the Collaboration context will always directly ask Identity & Access if a user with a certain role exists in that context. To verify this, it makes an HTTP call to the REST API of Identity & Access.

In terms of ports & adapters, the integration relation between these two contexts can be modelled as an "IdentityAndAccess" port in the Collaboration context, together with an adapter for that port which you could call something like "Http", after the technical protocol used for communication through this port. The directory/namespace structure would become something like this:

src/
    IdentityAndAccess/
        Domain/
        Application/
        Infrastructure/
            Api/
                Http/ # Serving a restfull HTTP API
    Collaboration/
        Domain/
        Application/
        Infrastructure/
            IdentityAndAccess/
                Http/ # HTTP client for I & A's REST API

You could even use a "faux" port adapter if you like. This adapter would not make a network call but secretly reach into Identity & Access's code base and/or database to get the answers it needs. This could be a pragmatic and stable solution, as long as you're aware of the dangers of not making a bounded context actually bounded. After all, bounded contexts were meant to prevent a big ball of mud, where the boundaries of a domain model aren't clear.

Conclusion

This concludes my "Layers, ports and adapters" series. I hope it gives you some useful suggestions for your next project - or try to apply it in (parts) of your current project. I'd be happy to hear about your experiences in the field. If you have anything to share, please do so in the comment section below this post.

Also, I would be stupid not to mention that I offer in-house training on these topics as well, in case you want to experience layered/hexagonal architecture hands-on.

PHP architecture design Comments

Layers, ports & adapters - Part 2, Layers

Posted on by Matthias Noback

The first key concept of what I think is a very simple, at the very least "clean" architecture, is the concept of a layer. A layer itself is actually nothing, if you think about it. It's simply determined by how it's used. Let's stay a bit philosophical, before we dive into some concrete architectural advice.

Qualities of layers

A layer in software serves the following (more or less abstract) purposes:

  • A layer helps protect what's beneath it. A layer acts like some kind of barrier: data passing it will be checked for basic structural issues before it gets passed along to a deeper layer. It will be transformed or translated to data that can be understood and processed by deeper layers. A layer also determines which data and behavior from a deeper layer will be allowed to be used in higher layers.
  • A layer comes with rules for which classes belong to it. If (as a team) you agree on which layers your software will have, you will know for every class you're wandering around with, in which layer to put it.
  • A system of layers can be used to modify the build order of a project. You could in fact build layer upon layer if you like. You can start at the outside, working inward, or at the inside, working towards the "world outside".
  • Being able to change the build order is an important tool for software architects. With layers you can build a big part of the application without deciding on which framework, ORM, database, messaging system, etc. to use.
  • Most legacy software consists of code without layers, which can be characterised as spaghetti code: everything can use or call anything else inside such an application. With a system of layers in place, and good rules for what they mean, and which things belong in which layer, you will have true separation of concerns. If you document the rules, and reinstate them in code reviews, I'm sure you will start producing code that is less likely to end up being considered "legacy code".
  • In order to do so, you need to write tests of course. Having a good system of layers in place will certainly make that easier. A different type of test will be suitable for each type of layer. The purpose of each test suddenly becomes more clear. The test suite as a whole will become much more stable, and it will run faster too.

A warning from Twitter:

I've never seen lasagna code to be honest, I did see a lot of spaghetti code. And I've written code that I thought was properly layered, but in hindsight, the layers were not well chosen. In this article I describe a better set of layers, for the biggest part based on how Vaughn Vernon describes them in his book "Implementing Domain-Driven Design" (see the reference below). Please note that layers are not specific to DDD though, although they do make way for a clean domain model, and at least, a proper amount of attention paid to it by the developer.

Directory layout & namespaces

Directly beneath my src/ directory I have a directory for every Bounded Context that I distinguish in my application. This directory is also the root of the namespace of the classes in each of these contexts.

Inside each Bounded Context directory I add three directories, one for every layer I'd like to distinguish:

  • Domain
  • Application
  • Infrastructure

I will briefly describe these layers now.

Layer 1 (core): Domain

The domain layer contains classes of any of the familiar DDD types/design patterns:

  • Entities
  • Value objects
  • Domain events
  • Repositories
  • Domain services
  • Factories
  • ...

Within Domain I create a subdirectory Model, then within Model another directory for each of the aggregates that I model. An aggregate directory contains all the things related to that aggregate (value objects, domain events, a repository interface, etc.).

Domain model code is ethereal as I like to call it. It has no touching points with the real world. And if it were not for the tests, no one would call this code yet (it happens in the higher layers). Tests for domain model code can be purely unit tests, as all they do is execute code in memory. There is no need for domain model code to reach out to the world outside (like approaching the file system, making a network call, generate a random number or get the current time). This makes its tests very stable and fast.

Layer 2 (wrapping Domain): Application

The application layer contains classes called commands and command handlers. A command represents something that has to be done. It's a simple Data Transfer Object, containing only primitive type values and simple lists of those. There's always a command handler that knows how to process a particular command. Usually the command handler (which is also known as an application service) performs any orchestration needed. It uses the data from the command object to create an aggregate, or fetch one from the repository, and perform some action on it. It then often persists the aggregate.

Code in this layer could be unit tested, but having an application layer is also a good starting point for writing acceptance tests, as Gherkin scenarios (and run them with a tool like Behat). An interesting article to start with on this topic is Modelling by Example by Konstantin Kudryashov.

Layer 3 (wrapping Application): Infrastructure

Again, if it weren't for the tests, code in the application layer wouldn't be executed by anyone. Only when you add the infrastructure (or short "infra") layer, the application will become actually usable.

The infrastructure layer contains any code that is needed to expose the use cases to the world and make the application communicate with real users and external services. Think of anything that gives your domain model and your application services "hands and feet" and actually makes the use cases of your application "usable". This layer contains the code for:

  • Processing HTTP requests, producing a response for an incoming request
  • Making (HTTP) requests to other servers
  • Storing things in a database
  • Sending emails
  • Publishing messages
  • Getting the current timestamp
  • Generating random numbers

This kind of code requires integration testing (in the terminology of Freeman and Pryce). You test all the "real things": the real database, the real vendor code, the real external services involved. This allows you to verify all the assumptions your infrastructure code makes about things that are beyond your control.

Frameworks and libraries

Any framework and library that is related to "the world outside" (e.g. networking, file systems, time, randomness) will be used or called in the infrastructure layer. Of course, code in the domain and application layers need the functionality offered by ORMs, HTTP client libraries, etc. But they can only do so through more abstract dependencies. This gets dictated by the dependency rule.

The Dependency Rule

The dependency rule (based on the one posed by Robert C. Martin in The Clean Architecture) states that you should only depend on things that are in the same or in a deeper layer. That means, domain code can only depend on itself, application code can only depend on domain code and its own code, and infrastructure code can depend on anything. According to the dependency rule it's not allowed for domain code to depend on infrastructure code. This should already make sense, but the rule formalizes our intuitions here.

Obeying a rule blindly isn't a good idea. So why should you use the dependency rule? Well, it guarantees that you don't couple the code in the domain and application layer to something as "messy" as infrastructure code. When you apply the dependency rule, you can replace anything in the infrastructure layer without touching and/or breaking code in any of the deeper layers.

This style of decoupling has for a long time been known as the Dependency Inversion Principle - the "D" in SOLID, as formulated by Robert C. Martin: "Depend on abstractions, not on concretions." A practical implementation in most object-oriented programming languages implies defining an interface for the thing you want to depend on (which will be the abstraction), then provide a class implementing that interface. This class contains all the low-level details that you've stripped away from the interface, hence, it's the concretion this design principle talks about.

Extending "infrastructure" to everything that's needed to connect your application to users and external services, including code written by us or by any (hardware) vendor we rely on, we should humbly conclude that by far the biggest part of an application is concerned with simply connecting our tiny bit of custom (yet precious) domain and application layer code to the "world outside".

Architecture: deferring technological decisions

Applying the proposed set of layers as well as the dependency rule gives you a lot of options:

  • You can develop many use cases before making decisions like "which database am I going to use?". You can easily use different databases for different use cases as well.
  • You can even decide later on which (web) framework you're going to use. This prevents your application from becoming "a Symfony application" or "a Laravel project".
  • Frameworks and libraries will be put on a safe distance from domain and application layer code. This helps with upgrading to newer (major) versions of those frameworks and libraries. It also prevents you from having to rewrite the system if you ever like to use, say, Symfony 3 instead of Zend Framework 1.

This, to me, is a very attractive idea: I want to keep my options open, and I want to make the right technological decisions; not at the beginning of a project, but only when I know, based on what the use cases of my application are starting to look like, which solutions will be the best ones for the situation at hand.

Having seen a lot of legacy code in my career, I also believe that applying correct layering as well as enforcing the dependency rule helps prevent you from producing legacy code. At least, it helps you prevent making framework and library calls all over the code base. After all, replacing those calls with something more up-to-date, proves to be one of the biggest challenges of working with legacy code. If you have it all in one layer, and if you always apply the dependency inversion principle, it'll be much easier to do so.

Conclusion

As I mentioned in my previous post, with this nice set of layers, we know now that there is a time and place for your beloved framework too. It's not all over the place, but in a restricted zone called "the infrastructure layer". In fact, it's more like the domain and application layer are restricted zones, since the dependency rule has only consequences for these two layers.

Some may find that the proposed layer system results in "too many layers" (I don't know about 3 layers being too many, but anyway, if it hurts, maybe you shouldn't do it). If you want, you could leave out the application layer. You won't be able to write acceptance tests against the application layer anymore (they will be more like system tests, which tend to be slow and brittle). And you won't be able to expose the same use case to, say, a web UI and a web API without duplicating some code. But it should be doable.

At least, make sure that the biggest improvement of your application's design comes from the fact that you separate domain (or core) code from infrastructure code. Optimize it for your application's use cases, apply anything you've learned from the discipline of Domain-Driven Design, and bend ORMs and web frameworks to obey your will.

We still need to look at infrastructure code in more detail. This will bring us to the topic of hexagonal architecture, a.k.a. "ports & adapters", to be covered in another article.

Further reading

You may also check out Deptrac, a tool that helps to enforce rules about layers and dependencies.

PHP architecture design Comments