The "builder pattern"
As a programmer you may know the Gang-of-Four Builder design pattern. The Docker builder pattern has nothing to do with that. The builder pattern describes the setup that many people use to build a container. It involves two Docker images:
- a "build" image with all the build tools installed, capable of creating production-ready application files.
- a "service" image capable of running the application.
Sharing files between containers
At some point the production-ready application files need to be copied from the build container to the host machine. There are several options for that.
1. Using docker cp
This is what the Dockerfile for the build image roughly looks like:
# in Dockerfile.build
# take a useful base image
FROM ...
# install build tools
RUN install build-tools
# create a /target directory for the executable
RUN mkdir /target
# copy the source code from the build context to the working directory
COPY source/ .
# build the executable
RUN build --from source/ --to /target/executable
To build the executables, simply build the image:
docker build \
-t build \ # tag the image as "build"
-f Dockerfile.build \ # use Dockerfile.build
. # use current directory as build context
In order to be able to reach in and grab the executable, you should first create a container (not a running one) based on the given image:
docker create \
--name build \ # name the container "build"
build # use the "build" image
You can now copy the executable
file to your host machine using docker cp
:
docker cp build:/target/executable ./executable
2. Using bind-mount volumes
I don't think making the compile step part of the build process of the container is good design. I like container images to be reusable. In the previous example, when the source files are modified, you need to rebuild the build image itself. But I'd just like to run the same build image again.
This means that the compile step should instead be moved to the ENTRYPOINT
or CMD
instruction. And that the source/
files shouldn't be part of the build context, but mounted as a bind-mount volume inside the running build
container:
# in Dockerfile.build
FROM ...
RUN install build-tools
ENTRYPOINT build --from /project/source/ --to /project/target/executable
This way, we should first build the build
image, then run it:
# same build process
docker build \
-t build \
-f Dockerfile.build \
.
# now we *run* the container
docker run \
--name build \
--rm \ # remove the container after running it
-v `pwd`:/project \ # bind-mount the entire project directory
build
Every time you run the build
container it will compile the files in /project/source/
and produce a new executable in /project/target/
. Since /project
is a bind-mount volume, the executable
file is automatically available on the host machine in target/
- there's no need to explicitly copy it from the container.
Once the application files are on the host machine, it will be easy to copy them to the service image, since that is done using a regular COPY
instruction.
The "multi-stage build"
A feature that has just been introduced to Docker is the "multi-stage build". It aims to solve the issue that for the above build process you need two Dockerfiles, and a (Bash) script to coordinate the build process, and get the files where they need to be, with a short detour via the host filesystem.
With a multi-stage build (see Alex Elis's introductory article on this feature), you can describe the build process in one file:
# in Dockerfile
# these are still the same instructions as before
FROM ...
RUN install build-tools
RUN mkdir /target
RUN build --from /source/ --to /target/executable
# another FROM; this defines the actual service container
FROM ...
COPY --from=0 /target/executable .
CMD ["./executable"]
There is only one image to be built. The resulting image will be the one defined last. It will contain the executable
copied from the first, intermediate "build" image (which will be disposed afterwards).
Note that this requires the source files to be inside the build context. Also note that the build image itself is not reusable; you can't run it again and again after you've made changes to the code; you have to build the image again. Since Docker will cache previously built image layers, this should still be fast, but it's something to be aware of.
Pipes & filters
I recently saw this question passing by on Twitter:
Learning Docker so dumb q probably - if a docker image generates binary output to a file, how do I copy to host?
— Raymond Camden (@raymondcamden) April 1, 2017
People suggested to use bind-mount volumes, as described above. Nobody suggested docker cp
. But the question prompted me to think of some other solution for getting generated files out of a container: why not stream the file to stdout
? It has several major advantages:
- The data doesn't have to end up in a file anymore, only later to be moved/deleted anyway - it can stay in memory (which offers fast access).
- Using
stdout
allows you to send the output directly to some other process, using the pipe operator (|
). Other processes may modify the output, then do the same thing, or store the final result in a file (inside the service image for example). - The exact location of files becomes irrelevant. There's no coupling through the filesystem if you only use
stdin
andstdout
. Thebuild
container wouldn't have to put its files in/target
, the build script wouldn't have to look in/target
too. They just pass along data.
In case you want to stream multiple files between containers, I think good-old tar
is a very good option.
Take the following Dockerfile
for example, which creates an "executable", then wraps it in a tar
file which it streams to stdout
:
FROM ubuntu
RUN mkdir /target
RUN echo "I am an executable" > /target/executable
RUN echo "I am a supporting file" > /target/supporting-file
ENTRYPOINT tar --create /target
To build this image, run:
docker build -t build -f docker/build/Dockerfile ./
Now run a container using the build
image:
docker run --rm -v `pwd`:/project --name build build
The archive generated by tar
will be sent to stdout
. It can then be piped into another process, like tar
itself, to extract the files again:
docker run --rm -v `pwd`:/project --name build build \
| tar --extract --verbose
If you want another container to accept an archive, pipe it in through stdin
(create the container in interactive mode):
docker run --rm -v `pwd`:/project --name build build \
| docker run -i [...]
Conclusion
We discussed several patterns for building Docker images. I prefer separate build files (instead of a multi-stage build with one Dockerfile). And as an alternative for writing files to a bind-mount volume, I really like the option to make the build image stream a tar
archive.
I hope there was something useful for you in here. If you find anything that can be improved/added, please let me know!
The stdout pipe trick is great! Although I prefer to not put anything in the EXTRYPOINT or CMD directive of the Dockerfile. That way, you don't have half of the command in the Dockerfile and half in a script for some documentation.docker run --rm -v
pwd
:/project --name build tar --create /target | tar --extract --verbosei think something like this will be more efficient
FROM ubuntu
RUN mkdir /target
RUN echo "I am an executable" > /target/executable
RUN echo "I am a supporting file" > /target/supporting-file
RUN tar --create /target > /target.tar
ENTRYPOINT ["/bin/cat", "/target.tar"]
what do you think?, because you don't need to run tar every time you invoke docker run
hmm, but tar without compressing will be same as cat but with capability to multiplexing many files
so this method maybe will be useful if you build on remote machine and just want the output