On several occasions I have tried to explain my opinion about "optional dependencies" (also known as "suggested dependencies" or "dev requirements") and I'm doing it again:
> There's no such thing as an optional dependency.
I'm talking about PHP packages here and specifically those defined by a composer.json
file.
What is a dependency?
First let's make sure we all agree about what a dependency is. Take a look at the following piece of code:
namespace Gaufrette\Adapter;
use Gaufrette\Adapter;
use \MongoGridFS;
class GridFS implements Adapter
{
private $gridFS;
public function __construct(MongoGridFS $gridFS)
{
$this->gridFS = $gridFS;
}
public function read($key)
{
$file = $this->find($key);
return ($file) ? $file->getBytes() : false;
}
}
This GridFS
class is part of the Gaufrette filesystem abstraction library, though I heavily modified it.
To determine all the dependencies of this code we can ask the following question:
> What is needed to run this code?
You need to think of several things:
Which PHP version is needed to run the code without getting a syntax error? Maybe you even need a specific patch version (like 5.3.6) because of a bug in older 5.3 versions that could interfere with your code.
Which PHP extensions should be installed?
Which PEAR libraries should be installed?
Which other packages should be installed?
In the case of the GridFS
class the PHP version should be at least PHP 5.3, because of the use of namespace
. Also the \MongoGridFS
class should be available. This class is part of the mongo
PECL extension for PHP. The \MongoGridFS
class is only available since version 0.9.0 of that PHP extension, so we have to make sure that we explicitly mention this version constraint. Finally, it appears there are no other packages needed to be able to use the GridFS
class. So when we would create a composer.json
file for a package that contains the GridFS
file, it would look like this:
{
...,
"require": {
"php": ">=5.3",
"ext-mongo": ">=0.9.0"
}
..
}
Now this is an exhaustive list of the dependencies of package that contains the GridFS
class: when these dependencies are installed, nothing stands in the way of using this class in your application.
The actual list of dependencies of knplabs/gaufrette
As I already mentioned the GridFS
class is part of the Gaufrette library which provides a filesystem abstraction layer so you can store files on different types of filesystems without worrying about the details of those filesystems. Let's take a look at the composer.json
file of this library:
{
"name": "knplabs/gaufrette",
"require": {
"php": ">=5.3.2"
},
"require-dev": {
...
},
"suggest": {
...
"amazonwebservices/aws-sdk-for-php": "to use the legacy Amazon S3 adapters",
"phpseclib/phpseclib": "to use the SFTP",
"doctrine/dbal": "to use the Doctrine DBAL adapter",
"microsoft/windowsazure": "to use Microsoft Azure Blob Storage adapter",
"ext-zip": "to use the Zip adapter",
"ext-apc": "to use the APC adapter",
"ext-curl": "*",
"ext-mbstring": "*",
"ext-mongo": "*",
"ext-fileinfo": "*"
},
...
}
After what we've discussed above, this is quite a surprise: the library says it has only one actual dependency: a PHP version that is at least 5.3.2. Everything else is either a "dev" requirement or a "suggested" requirement.
Of course people who use Composer and Packagist for some time now (like myself) have become quite used to this way of advertising the dependencies of a package. But it is just wrong. As we concluded earlier, ext-mongo
is a true dependency of the GridFS
class, yet looking at the composer.json
file it is only a suggested dependency.
This means that if I want to use the class in my project, it is not sufficient to require just the knplabs/gaufrette
package. I also have to add ext-mongo
as a requirement to my own project. Which is semantically wrong: it is not my project that needs the mongo
extension, it is the knplabs/gaufrette
package that actually needs it. Besides, how do I know which version of ext-mongo
I have to choose? Dependencies listed under the suggest
key in composer.json
don't come with version constraints, so I have to figure them out myself.
Not just this package
knplabs/gaufrette
is not the only package out there that advertises actual, required dependencies as "suggested" dependencies. It is a convenient way for package maintainers to put a lot of different classes in a package that may or may not be needed by users. Since using those classes is optional, their dependencies are made optional too. But package maintainers forget that dependencies never are optional. They are always required, since the code would not be executable without them.
The solution
What package maintainers should do is split their packages. In the case of knplabs/gaufrette
this means there should be a knplabs/gaufrette
package containing all the generic code for filesystem abstraction. Then each specific adapter, like the GridFS
class, should live in its own package, e.g. knplabs/gaufrette-mongo-gridfs
. This package itself has the following dependencies:
{
...,
"require": {
"php": ">=5.3",
"knplabs/gaufrette": "~0.1"
"ext-mongo": ">=0.9.0"
}
}
No hidden dependencies there: everything is truly required for using the code in this package.
On the other hand the knplabs/gaufrette
package has almost no dependencies anymore, and the "adapter" packages are listed under the suggested
key:
{
"require": {
"php": ">=5.3.2"
},
"suggested": {
"knplabs/gaufrette-mongo-gridfs": "For storing files using Mongo GridFS",
...
}
}
This approach has many advantages:
The main package will be very stable. There are almost no reasons for it to change anymore, since all the moving parts are inside the "adapter" packages.
The adapter packages can have different specialists as maintainers, for instance the
knplabs/gaufrette-mongo-gridfs
can be maintained by someone who knows all about MongoDB.Users don't have to keep track of available updates for parts of the library they don't use.
Users don't have to manually add extra dependencies to their projects (which means they don't have to worry about version constraints for them).
So keep in mind, next time you are tempted to add a suggested dependency to your package: is it an actual dependency of (part of) the code in this package? Then split the package and reinstate that dependency as a true requirement. If all the code in the package works perfectly well without that suggested dependency, then you are indeed allowed to advertise it as a suggested dependency.
Want to know more?
I'm working on a book about package design principles, based on the work of Robert Martin. You may register yourself as an "interested reader" and receive a considerable discount when the first part of the book becomes available next week.
You may also want to read some of the articles about package coupling by Paul Jones (Frameworks are good, components are awesome!, Symfony components: sometimes decoupled, sometimes not). He maintains the Aura framework and components and does a great job when it comes to package coupling.
The idea itself is great, but what about the development requirements? Where should I put the require-dev dependencies? I would not bother those.
I agree with you in 100%. Basically we should do it for each adapter in Gaufrette and each adapter should have own maintainer... but currently this can be difficult to achieve :)
Thanks for your opinion about the matter (especially since you are one of the core maintainers of Gaufrette!). I understand there can be practical difficulties with this :)
You are absolutely true that as a user of a library, requiring 1 package for 1 feature must install only/all the required packages.
The general pattern for packages for composer is to have 1 repository = 1 library.
If we want to multiply packages for each sub-requirements (adaptor), this means that we must create many new git repository and split the code. That would be more work to maintain.
We could find a solution to have many libraries, with different requirements, pointing to the same Git repository.
That's an interesting point. Lukas Smith mentioned something like that too: maintaining many repositories gives some overhead.
Well, first of all, I don't share that experience. The overhead of having another repository is just really small, you only need to set up some things once, like a
composer.json
file, a license file, a readme file (could be really simple), a.gitignore
file (always the same), a test bootstrap file, you need to create the repository on GitHub, you need to register the package on Packagist, set up continuous integration on Travis, etc. This quite a hassle. However, it will take no more than half an hour to do all this. Besides everything mentioned here could be easily automated. I didn't actually search for such tools, but I'm sure they exist or could easily be created. Maybe I'll do it myself :)After the initial "slow" start for setting up a new package, things will become really easy for the maintainer(s) of those packages. I sure hope that more people will take this approach.
The overhead of creating a new repo is not the problem since this is a one time effort. However there is daily stuff: Coordinating new releases, cross cutting changes (these should ideally be very few when doing decoupling right), constantly filled up travis-ci queue, tickets and PRs scattered across many repos etc. This can quickly add up.
Well, working with just one repository and splitting all subrepositories with a good post-commit and post-tag hook, this nightmare becomes just a very simple workflow.
I really like what Matthias proposes, but I can easily understand the complexity of maintaining many packages without splitting process.
Thanks for pointing this out Lukas! I can imagine with these big projects such overhead really starts to add up. Do you think there are quick wins when it comes to the major pain points?
Hmm not sure about quick wins. We are not using subtree splits but this seems to be a popular option by several projects. Building CLI tools to help is another approach (gush, Fabien has his own thing etc). We also build our own dashboards to aggregate some key bits of info (http://ghag.dantleech.com/#, http://cmf.davidbu.ch) though there are also some generic tools (http://williamdurand.fr/Tra..., https://waffle.io). Paying travis ci to get a dedicated queue is probably a bit out of reach for most.
a question also i want to ask, should Gush persist the composer.lock on the repo? why yes or no? in your opinion
It depends, for an application (including phar-only) its best to have your versions locked so that everything works as expected when installing (especially if your using a dev version).
For a library it does not make sense to keep the lock file, but could be handy to figure out what versions have changed that break your tests (this is something Johannes (schmittjoh) once mentioned).
We have done so on Gush now. Thanks @sstok!
i think one of the unexploited features of composer is the provider key. I tried to use this in Gush but it just did not work. It does work on cmf but i think you can almost ignore it since it is not relevant for the use of composer. I honestly think it should be removed from composer.
Gush has adapters https://github.com/gushphp/... but i removed the dependencies back because they often cause problems when installing a package. So i guess if someone wants to install Github Adapter one should clone the adapter and it will pull the main Gush package?
See the problem?
Now if we say that we want to install Gush with support for github adapter then we should probably create a deploy or build package that pulls these other two then.
Ultimately I was thinking pulling all the adapters into the Gush repository since the idea is to switch for any kind of support, however that would make Gush a fat beast. It is already a beast :), i mean including Github, enterprise, bitbucket, gitlab, etc, even a Git abstraction.
I have also seen how Behat phar extensions are coupled but more needs to be said about how to approach the clustering of packages and its dependencies when integrating.
Good point - it is quite difficult to accomplish what I mention here when the result of building your project should be a
.phar
file ;) Such a project would definitely benefit from acomposer.lock
file (but it seems you have settled that question already!). The same goes for Composer itself, which people will upgrade whatever compatibility feature is included (like SVN support, when they only use Git anyway).But even though people will only install Gush as a
.phar
file, it would still make sense for you to have separate repositories/packages. It will probably be easier to maintain them separately (different people can maintain the different adapters. You will have one "gush/gush-core" package with files shared by all the adapters, upon which the adapter packages can depend. Then you could create a new "gush/boxed-gush" package, which requires all the available adapter packages and contains the tools needed to build the.phar
file. What do you think?Great post, and glad to see others taking up the banner on this. Thanks for the link-backs; for one more related to package decoupling, may I suggest http://paul-m-jones.com/arc... (“On Decoupling and Dependencies”) ?
Thanks Paul. As I was diving deeper in the archive of your blog, I saw that one too. It's certainly ironic: I actually remember reading that article a while ago when I hadn't given so much thought on the subject. At the time I didn't agree with you at all!
Now that I read it again, there are so many interesting things going on in that debate, and it's mainly people-things. We'll be in touch I guess!