5x Faster Rust Docker Builds with cargo-chef
- 2344 words
- 12 min
cargo-chef is a new
cargo sub-command to build just the dependencies of your Rust project based on a JSON description file, a recipe.
cargo-chef can be used to fully leverage Docker layer caching, therefore massively speeding up Docker builds for Rust projects.
On our commercial codebase (~14k lines of code, ~500 dependencies) we measured a 5x speed-up: we cut Docker build times from ~10 minutes to ~2 minutes.1
Subscribe to the newsletter to be notified when a new article is published on this blog.
Rust shines at runtime, consistently delivering great performance, but it comes at a cost: compilation times.
They have been consistently among the top answers in the Rust annual survey when it comes to the biggest challenges or problems for the Rust project.
Optimised builds (
--release), in particular, can be gruesome - up to 15/20 minutes on medium projects with several dependencies. Quite common on web development projects pulling in many foundational crates from the async ecosystem (
How does this impact your day-to-day if you are working on a Rust project?
Docker containers are a mainstream technology to deploy software in production environments - most organisations have a Continuous Integration/Continuous Deployment pipeline that clones a project from a version control system (e.g. GitHub) and builds a Docker image to be deployed on top of a container orchestrator (e.g. Kubernetes, Nomad or other commercial solutions).
In a Docker build you will always be using the optimised build profile (
--release) to get top-performance in your production environment. More often than not CI/CD pipelines do not run on top of very beefy machines - a couple of cores, maybe, not much more than that.
This combination is deadly.
It can take more than 30 minutes to go from a merged PR to rolling out the new version to your end-users.
Such a long delay can have nefarious knock-on effects.
Slow Docker builds will kill you
Slow Docker builds will, over time, reduce your deployment frequency.
Accelerate has taught you to optimise for small changes deployed multiple times a day. But if each build takes ages it is very unlikely that your engineers will be willing to go through the ordeal often.
They will start batching up changes, making it much more likely than one of those deployments will result in an outage.
Slow Docker builds will bite you again during those outages: the speed of your CI pipeline puts a hard limit on how fast you can roll out an emergency patch to mitigate an incident ("fix forward").
If it takes 20 minutes to build a Docker container then the incident will be at least 20 minutes long (assuming you find the right fix as soon as the incident happens - unlikely).
In other words - do not neglect your CI/CD pipeline. It impacts your bottom line.
Ok, ok, you convinced me! I want fast Docker builds! How?
Docker layer caching
Let's have a look at your typical Rust Dockerfile:
FROM rust as builder WORKDIR app COPY . . # This works with the dummy project generated by `cargo new app --bin` RUN cargo build --release --bin app FROM rust as runtime WORKDIR app COPY --from=builder /app/target/release/app /usr/local/bin ENTRYPOINT ["./usr/local/bin/app"]
It is a multi-stage build: we create an intermediate Docker image (
builder) to compile our binary and then we copy that binary over to the final Docker image (
runtime) where we actually run it.
builder stage often requires more dependencies (e.g. OS packages) than the runtime stage, which can be kept fairly slim leading to smaller images with minimal attack surface.
If you run
docker build -t dummy-image . you will see something like this in your terminal:
Sending build context to Docker daemon 41.47kB Step 1/8 : FROM rust as builder ---> f5fde092a1cd Step 2/8 : WORKDIR app ---> Using cache ---> 53c89dd8e048 Step 3/8 : COPY . . ---> 39bc69ee400b Step 4/8 : RUN cargo build --release --bin app ---> Running in 9dc66ef72185 Compiling app v0.1.0 (/app) Finished release [optimized] target(s) in 0.73s Removing intermediate container 9dc66ef72185 ---> 13b22cf28e60 Step 5/8 : FROM rust as runtime ---> f5fde092a1cd Step 6/8 : WORKDIR app ---> Using cache ---> 53c89dd8e048 Step 7/8 : COPY --from=builder /app/target/release/app /usr/local/bin ---> f1e7055edd75 Step 8/8 : ENTRYPOINT ["./usr/local/bin/app"] ---> Running in dc4a9dcc7cd5 Removing intermediate container dc4a9dcc7cd5 ---> e127c4129b2f Successfully built e127c4129b2f Successfully tagged dummy-image:latest
ADD instruction creates a layer: a diff between the previous state (the layer above) and the current state after having executed the specified command.
Layers are cached.
If the starting point of an operation has not changed (e.g. the base image) and the command itself has not changed (e.g. the checksum of the files copied by
COPY) Docker does not perform any computation and directly retrieves a copy of the result from the local cache.
We can see it in action by running again
docker build -t dummy-image .:
Sending build context to Docker daemon 41.47kB Step 1/8 : FROM rust as builder ---> f5fde092a1cd Step 2/8 : WORKDIR app ---> Using cache ---> 53c89dd8e048 Step 3/8 : COPY . . ---> Using cache ---> 39bc69ee400b Step 4/8 : RUN cargo build --release --bin app ---> Using cache ---> 13b22cf28e60 Step 5/8 : FROM rust as runtime ---> f5fde092a1cd Step 6/8 : WORKDIR app ---> Using cache ---> 53c89dd8e048 Step 7/8 : COPY --from=builder /app/target/release/app /usr/local/bin ---> Using cache ---> f1e7055edd75 Step 8/8 : ENTRYPOINT ["./usr/local/bin/app"] ---> Using cache ---> e127c4129b2f Successfully built e127c4129b2f Successfully tagged dummy-image:latest
Using cache log after every single step. No output at all from
cargo build - execution has been skipped entirely.
Docker layer caching is fast and can be leveraged to massively speed up Docker builds.
The trick is optimising the order of operations in your Dockerfile: anything that refers files that are changing often (e.g. your source code) should appear as late as possible, therefore maximising the likelihood of the previous step being unchanged and allowing Docker to retrieve the result straight from the cache.
The expensive step is usually compilation.
Most programming languages follow the same playbook: you
COPY a lock-file of some kind first, build your dependencies,
COPY over the rest of your source code and then build your project.
This guarantees that most of the work is cached as long as your dependency tree does not change between one build and the next.
In a Python project, for example, you might have something along these lines:
FROM python:3 COPY requirements.txt RUN pip install -r requirements.txt COPY src/ /app WORKDIR /app CMD ["python", "app"]
What about Rust?
Caching Rust builds
cargo, as of today, does not provide a mechanism to build your project dependencies starting from its
Cargo.lock file (e.g.
cargo build --only-deps).
Therefore Rust projects have always struggled to leverage Docker layer caching properly.
If you search for "Rust Docker cache" on Google you will bump into a variety of articles that propose a variety of workarounds.
The blessed answer on StackOverflow, as many other blog posts, suggests the following steps to unlock Docker layer caching for a simple project: copy the lock file, create a dummy
main.rs file, build the project, delete the dummy file, copy over your source code, build again.
FROM rust WORKDIR /var/www/app COPY dummy.rs . COPY Cargo.toml . RUN sed -i 's#src/main.rs#dummy.rs#' Cargo.toml RUN cargo build --release RUN sed -i 's#dummy.rs#src/main.rs#' Cargo.toml COPY . . RUN cargo build --release CMD ["target/release/app"]
It is a bit cumbersome but you can live with it for a simple single-binary project. You just need to keep your Dockerfile up to date every time you restructure your file structure (and book some time to explain to your colleagues what the hell you are doing).
As soon as your project grows in complexity (e.g. a workspace with a few crates) this "workaround" leads to an entangled mess that is painful to watch (and maintain).
I am currently finalising the chapter of Zero To Production In Rust on deployment best-practices for Rust projects - I have no intention of teaching this workaround as the best-way to get fast Docker builds with Rust.
I set out to build something nicer and less error-prone.
cargo-chef is a new
You can install from crates.io with
cargo install cargo-chef
cargo-chef exposes two commands:
cargo chef --help
cargo-chef USAGE: cargo chef <SUBCOMMAND> SUBCOMMANDS: cook Re-hydrate the minimum project skeleton identified by `cargo chef prepare` and build it to cache dependencies prepare Analyze the current project to determine the minimum subset of files (Cargo.lock and Cargo.toml manifests) required to build it and cache dependencies
prepare examines your project and builds a recipe that captures the set of information required to build your dependencies.
cargo chef prepare --recipe-path recipe.json
Nothing too mysterious going on here, you can examine the
recipe.json file: it contains the skeleton of your project (e.g. all the
Cargo.toml files with their relative path, the
Cargo.lock file if available) plus a few additional pieces of information.
In particular it makes sure that all libraries and binaries are explicitly declared in their respective
Cargo.toml files even if they can be found at the canonical default location (
src/main.rs for a binary,
src/lib.rs for a library).
recipe.json is the equivalent of the Python
requirements.txt file - it is the only input required for
cargo chef cook, the command that will build out our dependencies:
cargo chef cook --recipe-path recipe.json
If you want to build in
cargo chef cook --release --recipe-path recipe.json
Let's see how that can be leveraged in a Dockerfile:
FROM rust as planner WORKDIR app # We only pay the installation cost once, # it will be cached from the second build onwards # To ensure a reproducible build consider pinning # the cargo-chef version with `--version X.X.X` RUN cargo install cargo-chef COPY . . RUN cargo chef prepare --recipe-path recipe.json FROM rust as cacher WORKDIR app RUN cargo install cargo-chef COPY --from=planner /app/recipe.json recipe.json RUN cargo chef cook --release --recipe-path recipe.json FROM rust as builder WORKDIR app COPY . . # Copy over the cached dependencies COPY --from=cacher /app/target target COPY --from=cacher /usr/local/cargo /usr/local/cargo RUN cargo build --release --bin app FROM rust as runtime WORKDIR app COPY --from=builder /app/target/release/app /usr/local/bin ENTRYPOINT ["./usr/local/bin/app"]
We are using four stages: the first computes the recipe file, the second caches our dependencies, the third builds the binary and the fourth is our runtime environment.
As long as your dependencies do not change the
recipe.json file will stay the same, therefore the outcome of
cargo cargo chef cook --release --recipe-path recipe.json will be cached, massively speeding up your builds (up to 5x measured on some commercial projects).
We are taking advantage of how Docker layer caching interacts with multi-stage builds: the
COPY . . statement in the
planner stage will invalidate the cache for the
planner container, but it will not invalidate the cache for the
cacher container, as long as the checksum of the
recipe.json returned by
cargo chef prepare does not change.
You can think of each stage as its own Docker image with its own caching - they only interact with each other when using the
COPY --from statement.
There is no rocket science at play here - you might argue that is just an elaborate way to perform the dirty workaround we talked about before. You would be right, to an extent.
But ergonomics matters and
cargo-chef offers a much more streamlined experience if you are newcomer looking for a quick and clean recipe to optimise your Docker build.
Caveats and limitations
cargo-chef on a few OpenSource projects and some of our commercial projects at TrueLayer. My testing has definitely not exhausted the range of possibilities when it comes to
cargo build customisations and I am sure that there are a few rough edges that will have to be smoothed out - please file issues on GitHub.
So far I have found the following limitations and caveats:
cargo buildmust be executed from the same working directory. If you examine the
target/debug/depsfor one of your projects using
catyou will notice that they contain absolute paths referring to the project
targetdirectory. If moved around,
cargowill not leverage them as cached dependencies;
cargo buildwill build local dependencies (outside of the current project) from scratch, even if they are unchanged, due to the reliance of its fingerprinting logic on timestamps (see this long issue on
cargo-chef has not yet been tested extensively with projects leveraging build files.
cargo-chef hopefully provides a streamlined workflow to help more people leverage Docker layer caching in their projects.
The project is as new as it gets: all feedback is appreciated and will inform future development directions.
If you end up using
cargo-chef on one of your projects, either OpenSource or commercial, I'd be delighted to hear about it!
You can reach out to me on Twitter via direct message.
Subscribe to the newsletter to be notified when a new article is published on this blog.
On CircleCI using the "Machine Linux Large" executor (4 CPU, 15 GB RAM) with Docker layer caching enabled.