How to Accelerate Docker Builds and Optimize Caching With "COPY --link"
Quick Links
COPY --link
is a new BuildKit feature which could substantially accelerate your Docker image builds. It works by copying files into independent image layers that don't rely on the presence of their predecessors. You can add new content to images without the base image even existing on your system.
This capability was added as part of Buildx v0.8 in March 2022. It's included in version 20.10.14 of the Docker CLI so you should already have access if you're running the latest release.
In this article, we'll show what --link
does and explain how it works. We'll also look at some of the situations in which it shouldn't be used.
What Is "--link"?
--link
is a new optional argument for the existing Dockerfile COPY
instruction. It changes the way copies work by creating a new snapshot layer each time you use it.
Regular COPY
statements add files to the layer that precedes them in the Dockerfile. The contents of that layer need to exist on your disk so the new content can be merged in:
FROM alpineCOPY my-file /my-file
COPY another-file /another-file
The Dockerfile above copies my-file
into the layer produced by the previous command. After the FROM
instruction, the image consists of Alpine's content:
bin/dev/
etc/
...
The first COPY
instruction produces an image that includes everything from Alpine, as well as the my-file
file:
my-filebin/
dev/
etc/
...
And the second COPY
instruction adds another-file
on top of this image:
another-filemy-file
bin/
dev/
etc/
...
The layer produced by each instruction includes everything that came before it, as well as anything newly added. At the end of the build, Docker uses a diffing process to work out the changes within each layer. The final image blob contains just the files that were added in each snapshot stage but this isn't reflected in the assembly process during the build.
Introducing "--link"
"--link" modifies COPY
to create a new standalone filesystem each time it's used. Instead of copying the new files on top of the previous layer, they're sent to a completely different location to become an independent layer. Layers are subsequently linked together to produce the final image.
Let's change the example Dockerfile to use --link
:
FROM alpineCOPY --link my-file /my-file
COPY --link another-file /another-file
The result of the FROM
instruction is unchanged - it yields the Alpine layer, with all that image's content:
bin/dev/
etc/
...
The first COPY
instruction has a noticeably different effect. This time another independent layer is created. It's a new filesystem containing only my-file
:
my-file
Then the second COPY
instruction creates another new snapshot with only another-file
:
another-file
When the build completes, Docker stores these independent snapshots as new layer archives (tarballs). The tarballs are linked back into the chain of preceding layers, building up the final image. This consists of all three snapshots merged together, resulting in a filesystem that matches the original one when containers are created:
my-fileanother-file
bin/
dev/
etc/
...
This image from the BuildKit project illustrates the differences between the two approaches.
Adding "COPY --link" to Your Builds
COPY --link
is only available when you're using BuildKit to build your images. Either run your build with docker buildx --create
or use docker build
with the DOCKER_BUILDKIT=1
environment variable set.
You must also opt-in to the Dockerfile v1.4 syntax using a comment at the top of your file:
# syntax=docker/dockerfile:1.4FROM alpine:latest
COPY --link my-file /my-file
COPY --link another-file /another-file
Now you can build your image with support for linked copies:
DOCKER_BUILDKIT=1 docker build -t my-image:latest .
Images built from Dockerfiles using COPY --link
can be used like any other. You can start a container with docker run
and push them straight to registries. The --link
flag only affects how content is added to the image layers during the build.
Why Linked Copies Matter
Using the --link
flag allow build caches to be reused even when content you COPY
in changes. In addition, builds may be able to complete without their base image even existing on your machine.
Returning to the example from above, standard COPY
behavior requires the alpine
image to exist on your Docker host before the new content can be added. The image will be downloaded automatically during the build if you've not previously pulled it.
With linked copies, Docker doesn't need the alpine
image's content. It pulls the alpine
manifest, creates new independent layers for the copied files, then creates a revised manifest that links the layers into those provided by alpine
. The content of the alpine
image - its layer blobs - will only be downloaded if you start a container from your new image or export it to a tar archive. When you push the image to a registry, that registry will store its new layers and remotely acquire the alpine
ones.
This functionality facilitates efficient image rebases too. Perhaps you're currently distributing a Docker image using the latest Ubuntu 20.04 LTS release:
FROM golang AS build...
RUN go build -o /app .
FROM ubuntu:20.04
COPY --link --from=build /app /bin/app
ENTRYPOINT ["/bin/app"]
You can build the image with caching enabled using BuildKit's --cache-to
flag. The inline
cache stores build cache data inside the output image, where it can be reused in subsequent builds:
docker buildx build --cache-to type=inline -t example-image:20.04 .
Now let's say you'd like to provide an image that's based on the next LTS after its release, Ubuntu 22.04:
FROM golang AS build...
RUN go build -o /app .
FROM ubuntu:22.04
COPY --link --from=build /app /bin/app
ENTRYPOINT ["/bin/app"]
Rebuild the image using the cache data embedded in the original version:
docker buildx build --cache-from example-image:20.04 -t example-image:22.04 .
The build will complete almost instantly. Using the cached data from the existing image, Docker can verify the files needed to build /app
haven't changed. This means the cache for the independent layer created by the COPY
instruction remains valid. As this layer doesn't depend on any other, the ubuntu:22.04
image won't be pulled either. Docker merely links the snapshot layer containing /bin/app
into a new manifest within the ubuntu:22.04
layer chain. The snapshot layer is effectively "rebased" onto a new parent image, without any filesystem operations occurring.
The model also optimizes multi-stage builds where changes can occur between any of the stages:
FROM golang AS buildRUN go build -o /app .
FROM config-builder AS config
RUN generate-config --out /config.yaml
FROM ubuntu:latest
COPY --link --from=config /config.yaml build.conf
COPY --link --from=build /app /bin/app
Without --link
, any change to the generated config.yaml
causes ubuntu:latest
to be pulled and the file to be copied in. The binary then has to be recompiled as its cache is invalidated by the filesystem changes. With linked copies, a change to config.yaml
allows the build to continue without pulling ubuntu:latest
or recompiling the binary. The snapshot layer with build.conf
inside is simply replaced by a new version that's independent of all the other layers.
When Not To Use It
There are some situations where the --link
flag won't work correctly. Because it copies files into a new layer, instead of adding them on top of the previous one, you can't use ambiguous references as your destination path:
COPY --link my-file /data
With a regular COPY
instruction, my-file
will be copied to /data/my-file
if /data
already exists as a directory in the image. With --link
, the target layer's filesystem will always be empty, so my-file
gets written to /data
.
The same consideration applies to symlink resolution. Standard COPY
automatically resolves destination paths that are symlinks in the image. When you're using --link
, this behavior isn't supported as the symlink won't exist in the copy's independent layer.
It's recommended you start using --link
wherever these limitations don't apply. Adopting this feature will speed up your builds and make caching more powerful. If you can't immediately remove ambiguous or symlinked destination paths, you can keep using the existing COPY
instruction. It's due to these backwards incompatible changes that --link
is an optional flag, instead of the new default.
Summary
BuildKit's COPY --link
is a new Dockerfile feature which can make builds quicker and more efficient. Images using linked copies don't need to pull previous layers just so files can be copied into them. Docker creates a new independent layer for each COPY
instead, then links those layers back into the chain.
You can start using linked copies now if you're building images with BuildKit and the latest version of the Buildx or Docker CLI. Adopting "--link" is a new best practice Docker build step, provided you're not affected by the changes to destination path resolution that it necessitates.
ncG1vNJzZmivp6x7qbvWraagnZWge6S7zGibnq6fpcBwtM6wZK2nXZawpLHLnqmarJVisbCvyp6pZpqlnrmlv4yapZ1ln6XBqrnIs5xmm5GYtaq6xmauoqyYYrCwvNhmo6Kmm2Q%3D