TL;DR:
- Use one
RUN
to prepare, configure, make, install and cleanup.
- Cleanup with:
apt-get remove --purge -y $BUILD_PACKAGES $(apt-mark showauto) && rm -rf /var/lib/apt/lists/*
I've been packaging the nghttp2
HTTP/2.0 proxy and client by
Tatsuhiro Tsujikawa in both Debian
and with docker
and noticed it takes some time to get the build
dependencies (C++ cough) as well as to do the build.
In the Debian packaging case its easy to create minimal dependencies
thanks to pbuilder
and ensure the binary package contains only the
right files. See
debian nghttp2
For docker
, since you work with containers it's harder to see what
changed, but you still really want the containers as small as
possible since you have to download them to run the app, as well as
the disk use. While doing this I kept seeing huge images (480 MB),
way larger than the base image I was using (123 MB) and it didn't
make sense since I was just packaging a few binaries with some small
files, plus their dependencies. My estimate was that it should be
way less than 100 MB delta.
I poured over multiple blog posts about Docker images and how to make
them small. I even looked at some of the squashing commands like
docker-squash that
involved import and export, but those seemed not quite the right
approach.
It took me a while to really understand that each Dockerfile
command creates a new container with the deltas. So when you see
all those downloaded layers in a docker pull
of an image, it
sometimes is a lot of data which is mostly unused.
So if you want to make it small, you need to make each Dockerfile
command touch the smallest amount of files and use a standard
image, so most people do not have to download your custom l33t base.
It doesn't matter if you rm -rf
the files in a later command; they
continue exist in some intermediate layer container.
So: prepare configure, build, make install and cleanup in one RUN
command if you can. If the lines get too long, put the steps in
separate scripts and call them.
Lots of Docker images are based on Debian images because they are a
small and practical base. The debian:jessie
image is smaller than
the Ubuntu (and CentOS) images. I haven't checked out the fancy
'cloud' images too much:
Ubuntu Cloud Images,
Snappy Ubuntu Core,
Project Atomic, ...
In a Dockerfile
building from some downloaded package, you
generally need wget
or curl
and maybe git
. When you install,
for example curl
and ca-certificates
to get TLS/SSL certificates,
it pulls in a lot of extra packages, such as openssl
in the standard
Debian curl build.
You are pretty unlikely to need curl or git after the build stage of
your package. So if you don't need them, you could - and you should
- remove them, but that's one of the tricky parts.
If $BUILD_PACKAGES
contains the list of build dependency packages
such as e.g. libxml2-dev
and so on, you would think that this would
get you back to the start state:
$ apt-get install -y $BUILD_PACKAGES
$ apt-get remove -y $BUILD_PACKAGES
However this isn't enough; you missed out those dependencies that got
automatically installed and their dependencies.
You could try
but that also doesn't grab them all. It's not clear why to me at
this point. What you actually need is to remove all autoadded
packages, which you can find with apt-mark showauto
So what you really need is
$ AUTO_ADDED_PACKAGES=`apt-mark showauto`
$ apt-get remove --purge -y $BUILD_PACKAGES $AUTO_ADDED_PACKAGES
I added --purge
too since we don't need any config files in /etc
for build packages we aren't using.
Having done that, you might have removed some runtime package
dependencies of something you built. That's harder to automatically
find, so you'll have to list and install those by hand
$ RUNTIME_PACKAGES="...."
$ apt-get install -y $RUNTIME_PACKAGES
Finally you need to cleanup apt which you should do with
rm -rf /var/lib/apt/lists/*
which is great and removes all the index
files that apt-get update
installed. This is in many best practice
documents and example Dockerfiles.
You could add apt-get clean
which removes any cached downloaded
packages, but that's not needed in the official Docker images of
debian and ubuntu since the cached package archive feature is
disabled.
Finally don't forget to delete your build tree and do it in the
same RUN
that you did a compile, so the tree never creates a new
container. This might not make sense for some languages where you
work from inside the extracted tree; but why not delete the src dirs?
Definitely delete the tarball!
This is the delta for what I was working on with dajobe/nghttpx
.
479.7 MB separate prepare, build, cleanup 3x RUNs
186.8 MB prepare, build and cleanup in one RUN
149.8 MB after using apt-mark showauto in cleanup
You can use docker history IMAGE
to see the detailed
horror (edited for width):
... /bin/sh -c /build/cleanup-nghttp2.sh && rm -r 7.595 MB
... /bin/sh -c cd /build/nghttp2 && make install 76.92 MB
... /bin/sh -c /build/prepare-nghttp2.sh 272.4 MB
and the smallest version:
... /bin/sh -c /build/prepare-nghttp2.sh && 27.05 MB
The massive difference is the source tree and the 232 MB of build
dependencies that apt-get
pulls in. If you don't clean all that up
before the end of the RUN
you end up with a huge transient layer.
The final size of 149.8 MB compared to the 122.8 MB debian/jessie
base image size is a delta of 27 MB which for a few servers, a client
and their libraries sounds great! I probably could get it down a
little more if I just installed the binaries. The runtime libraries
I use are 5.9 MB.
You can see my work at
github
and in the
Docker Hub
... and of course this HTTP/2 setup is used on this blog!
References