Pavex, progress report #3: nested routes and borrow checking

April 20, 2023

2813 words

15 min

👋 Hi!
It's Luca here, the author of "Zero to production in Rust".
This is progress report about pavex, a new Rust web framework that I have been working on. It is currently in the early stages of development, working towards its first alpha release.

Check out the announcement post to learn more about the vision!

Personal updates

April was stressful.
I resigned from AWS, started planning my relocation (UK->Italy) while finalising the details of the renovations for the apartment we bought in Italy. Pretty intense!

Nonetheless, I managed to squeeze in some time for pavex—let's talk about the progress!

You can comment this update on r/rust.

What's new

Compile-time validation for route parameters

In March I added support for route parameters:

pub fn blueprint() -> Blueprint {
    let mut bp = Blueprint::new();
    // `home_id` is a route parameter! 
    // It will extract the corresponding segment out of incoming requests at runtime. 
    // E.g. `1` for `GET /home/1`
    bp.route(GET, "/home/:home_id", f!(crate::get_home));
    // [...]
}

pub fn get_home(
    // You can then retrieve (and automatically deserialize!) the extracted route parameters 
    // in your request handler using the `RouteParams` extractor.
    params: &RouteParams<HomeRouteParams>
) -> String {
    format!("Welcome to {}", params.0.home_id)
}

#[derive(serde::Deserialize)]
pub struct HomeRouteParams {
    pub home_id: u32,
}

What happens if you change the route template from /home/:home_id to /home/:id?
From a routing perspective, they're absolutely equivalent: a GET request to /home/1 will match with both!

But the request handler will fail to extract route parameters from /home/:id if you forget to change the field name in HomeRouteParams from home_id to id. Even worse, the failure will happen at runtime—if there are no tests for this endpoint, you might end up shipping broken code in production. I don't like that.

A new procedural macro comes to the rescue!
Instead of annotating HomeRouteParams with #[derive(serde::Deserialize)], you can use #[RouteParams]:

#[RouteParams]
pub struct HomeRouteParams {
    pub home_id: u32,
}

If you now change /home/:home_id to /home/:id, you'll be greeted by this error when you try to re-generate your application code:

ERROR:
  × `app::get_home` is trying to extract route parameters using `RouteParams<HomeRouteParams>`.
  │ Every struct field in `app::HomeRouteParams` must be named after one of the route parameters 
  | that appear in `/home/:id`:
  │ - `id`
  │
  │ There is no route parameter named `home_id`, but there is a struct field named
  │ `home_id` in `app::HomeRouteParams`. This is going to cause a runtime error!
  │
  │     ╭─[src/lib.rs:43:1]
  │  43 │     ));
  │  44 │     bp.route(GET, "/home/:id", f!(crate::get_home));
  │     ·                                ───────────┬──────
  │     ·             The request handler asking for `RouteParams<app::HomeRouteParams>`
  │  45 │     
  │     ╰────
  │   help: Remove or rename the fields that do not map to a valid route parameter.

Quite cool, isn't it?
Let's unpack how it works under the hood!

serde::Deserialize is what ties together the route template (/home/:home_id) with the binding struct (HomeRouteParams). Generally speaking, we can't make any assumptions on the deserialization logic: a developer is free to provide their own exotic implementation of serde::Deserialize for HomeRouteParams—e.g. it might be indeed looking for a route segment named id which is then bound to the home_id field.
If serde::Deserialize is derived though, we can make assumptions: each field in the struct must be named as one of the route parameters defined in the route template. If that's not the case, deserialization is going to fail at runtime.

This is where #[RouteParams] comes into the picture. It does two things:

Derive serde::Deserialize for your type;
Implement pavex_runtime::serialization::StructuralDeserialize for your type.

StructuralDeserialize is a marker trait:

pub trait StructuralDeserialize {}

It provides no functionality on its own. It's a way for us to tag a type and say "their implementation of serde::Deserialize is derived"¹. The pavex compiler can then look it up!
When it processes the request handlers you registered, it looks at their input parameters: is there any RouteParams<T> in there? If there is one, pavex checks if T implements StructuralDeserialize:

if it does, pavex kicks off additional checks with respect to field naming;
if it doesn't, pavex assumes that you rolled your own implementation of serde::Deserialize and trusts that you know what you are doing.

The technique is inspired by Rust's standard library—StructuralEq and StructuralPartialEq play the same role for identifying derived implementation of Eq and PartialEq.

Nesting and encapsulation

Everything starts simple, including APIs.
You can easily keep your entire router and state in a single function when you are exposing 4 or 5 endpoints. Things get really messy when, over time, the API surface grows to tens (if not hundreds!) of routes with an intricate network of dependencies and middlewares.

Our brains are limited—it's hard to keep too many different things in mind when working on a codebase². That's what modules are for!
Modules empower us to segment our domain in units that are small enough to be reasoned about, encapsulating complexity behind an interface that abstracts away the nitty-gritty details.

Last month, pavex had no mechanism for encapsulation. All routes, constructors and error handlers lived in a flat "namespace". That's optimal for a small microservice—you don't want to pay the cognitive price of abstractions you don't need.
But I want pavex to be able to support your project as it grows in complexity—it should be the ideal foundation for building large monoliths in Rust³.

That's why I've added support for nesting:

pub fn blueprint() -> Blueprint {
    let mut bp = Blueprint::new();
    bp.constructor(f!(crate::db_connection_pool), Lifecycle::Singleton);

    bp.nest_at("/admin", admin_blueprint());
    bp.nest_at("/api", api_bp());
    bp
}

pub fn admin_blueprint() -> Blueprint {
    let mut bp = Blueprint::new();
    bp.constructor(f!(crate::session_token), Lifecycle::RequestScoped);
    bp.route(GET, "/", f!(crate::admin_dashboard));
    // [...]
}

pub fn api_blueprint() -> Blueprint {
    // [...]
}

You can decompose your application into smaller Blueprints, each focused on a subset of routes and constructors.
A nested Blueprint inherits all the constructors registered against its parents: in our example, both /admin/* and /api/* request handlers can access the database connection pool returned by the top-level constructor.
The opposite, instead, is forbidden: constructors registered against a nested blueprint are not visible to its parent(s) nor to its siblings. Going back to the example above, /api/* request handlers cannot access the session token returned by the constructor registered in admin_blueprint.

This kind of encapsulation allows you to keep a close eye on the set of dependencies available to each part of your application.

nest_at has another side-effect: it adds a prefix to all the routes registered by the nested blueprint. crate::admin_dashboard will be invoked on GET /admin/ requests instead of GET /.
Decomposition, though, does not always map cleanly to path prefixes. That's why pavex provides another method, nest, which has identical behaviour with respect to state encapsulation but does not add any route prefix.

Dealing with ambiguity

Nesting and encapsulation are cool on paper, but the devil is in the details.
What happens if api_blueprint and admin_blueprint try to register different constructors for the same singleton type, a u64?
Singletons should be... well, singletons—built once and used for the entirety of the application lifetime. Which constructor should pavex use? The one provided by api_blueprint? Or the one provided by admin_blueprint?

The answer is neither! This edge case is accounted for and we return a dedicated error:

ERROR:
  × The constructor for a singleton must be registered once.
  │ You registered the same constructor for `u64` against 2 different nested
  │ blueprints.
  │ I don't know how to proceed: do you want to share the same singleton
  │ instance across all those nested blueprints, or do you want to create a
  │ new instance for each nested blueprint?
  │
  │     ╭─[src/lib.rs:10:1]
  │  10 │     let mut bp = Blueprint::new();
  │  11 │     bp.constructor(f!(crate::admin::singleton), Lifecycle::Singleton);
  │     ·                    ──────────┬───────────────
  │     ·                              ╰── A constructor was registered here
  │     ╰────
  │     ╭─[src/lib.rs:22:1]
  │  22 │     let mut bp = Blueprint::new();
  │  23 │     bp.constructor(f!(crate::api::singleton), Lifecycle::Singleton);
  │     ·                    ──────────┬─────────────
  │     ·                              ╰── A constructor was registered here
  │     ╰────
  │   help: If you want to share a single instance of `u64`, remove constructors
  │         for `u64` until there is only one left. It should be attached to a
  │         blueprint that is a parent of all the nested ones that need to use it.
  │        ☞
  │          ╭─[src/lib.rs:5:1]
  │        5 │ pub fn blueprint() -> Blueprint {
  │        6 │     let mut bp = Blueprint::new();
  │          ·                  ────────┬───────
  │          ·                          ╰── Register your constructor against this blueprint
  │          ╰────
  │   help: If you want different instances, consider creating separate newtypes
  │         that wrap a `u64`.

A similar reasoning applies if a nested blueprint tries to override the constructor registered by its parent for a singleton type.
The approach is different, instead, for request-scoped and transient types: nested blueprints can override the behaviour of their parent—e.g. register a different error handler for the same extractor.

Striking a balance between expressiveness and the principle of least surprise is tricky. I expect that I'll have to iterate further on this part of the API going forward, but I'm happy enough with this first version!

Borrow checking

pavex is a code generator—it takes as input a Blueprint that describes your application and spits out Rust code that can serve incoming requests.
There is a key detail here: the Rust code that we generate must compile successfully, which in turn implies that it must satisfy the Rust borrow checker!

That's trickier than it sounds—it might or might not be possible to generate code that makes the borrow checker happy, depending on the shape of your dependency graph. Let's look at an example:

To invoke request_handler, we need to build an instance of B and an instance of C. But their respective constructors want to take A as input by value.
That can't be—the borrow checker would reject the resulting code.

Last month, that's exactly what used to happen: pavex would happily accept your Blueprint and then emit code that didn't compile. Understanding why it didn't compile (and mapping it back to your registered constructors) was left as an exercise for the user.

That sucks, and I spent the better part of April fixing it.
If you try to pass a similar call graph to pavex today, it gets rejected with an error:

ERROR:
  × I can't generate code that will pass the borrow checker *and* match the
  │ instructions in your blueprint.
  │ There are 2 components that take `app::A` as an input parameter, consuming
  │ it by value. Since I'm not allowed to clone `app::A`, I can't resolve
  │ this conflict.
  │
  │   help: Allow me to clone `app::A` in order to satisfy the borrow checker.
  │         You can do so by invoking `.cloning(CloningStrategy::CloneIfNecessary)`
  │         on the type returned by `.constructor`.
  │        ☞
  │           ╭─[src/lib.rs:40:1]
  │        40 │     let mut bp = Blueprint::new();
  │        41 │     bp.constructor(f!(crate::build_a), Lifecycle::RequestScoped);
  │           ·                    ──────┬──────────
  │           ·                          ╰── The constructor was registered here
  │           ╰────
  │   help: Considering changing the signature of the components that consume
  │         `app::A` by value.
  │         Would a shared reference, `&app::A`, be enough?
  │        ☞
  │           ╭─[src/lib.rs:42:1]
  │        42 │     bp.constructor(f!(crate::build_b), Lifecycle::RequestScoped);
  │        43 │     bp.constructor(f!(crate::build_c), Lifecycle::RequestScoped);
  │           ·                    ──────┬──────────
  │           ·                          ╰── One of the consuming constructors
  │           ╰────
  │        ☞
  │           ╭─[src/lib.rs:41:1]
  │        41 │     bp.constructor(f!(crate::build_a), Lifecycle::RequestScoped);
  │        42 │     bp.constructor(f!(crate::build_b), Lifecycle::RequestScoped);
  │           ·                    ──────┬──────────
  │           ·                          ╰── One of the consuming constructors
  │           ╰────
  │   help: If `app::A` itself cannot implement `Clone`, consider wrapping it in
  │         an `std::sync::Rc` or `std::sync::Arc`.

The borrow checker is a tricky beast on its own, so I put in a lot of effort in suggesting possible remediations.
The first is what I'd generally recommend: just Clone it!

By default, pavex doesn't inject .clone() invocations. You need to explicitly tell the framework that it's OK to clone a type if needed:

pub fn blueprint() -> Blueprint {
    let mut bp = Blueprint::new();
    bp.constructor(f!(crate::build_a), Lifecycle::RequestScoped)
        // 👇 This allows `pavex` to sprinkle in `.clone()` calls where helpful
        .cloning(CloningStrategy::CloneIfNecessary);
    // [...]
}

That change is enough to fix the previous error—the call graph becomes:

pavex's code generation is then smart enough to process the Clone::clone() node before invoking build_b, therefore producing code that passes the borrow checker 🎉

Let's be clear: pavex does not yet catch all possible borrow-checking issues ahead of code generation, but it does a fairly good job at catching the most common violations (e.g. borrow after moved) as well as some of the trickier ones (e.g. when control flow statements like match are involved).
Its main blindspots are "hidden" borrows—e.g. C depends on B<'a>, which stores a reference to A as one of its fields, therefore implying that C borrows from A. It can be solved, there is no hard blocker there—it's just a matter of putting in the work, something I plan to tackle in the mid-future.

Circular dependencies

Last but not least, I've done some bug squashing!
pavex doesn't like circular dependencies, like in this call graph:

It used to handle circular dependencies very poorly—it would hang, indefinitely, stuck in an infinite loop. I have introduced an intermediate analysis step (called DependencyGraph) to detect circular dependencies before they become an existential problem, removing the infinite loop and emitting a nice error as a result:

ERROR:
  × The dependency graph cannot contain cycles, but I just found one!
  │ If I tried to build your dependencies, I would end up in an infinite loop.
  │
  │ The cycle looks like this:
  │
  │ - `build_b` depends on `app::C`, which is built by `build_c`
  │ - `build_c` depends on `app::A`, which is built by `build_a`
  │ - `build_a` depends on `app::B`, which is built by `build_b`
  │
  │   help: Break the cycle! Remove one of the 'depends-on' relationship by
  │         changing the signature of one of the components in the cycle.

What's next?

First and foremost, some rest! I'll be off the grid for a few days, taking a little break.

Speaking of pavex, there is one key feature that I've yet to implement: middlewares.
But they'll have to wait a bit longer. I am eager to kick the tires on pavex—i.e. try to build a small project to see how it feels to develop with pavex.

I'll probably be implementing the Realworld API spec—I've done it in the past using actix-web and it should give me a pretty solid measure of what needs to be done next for pavex.
As a bonus, it'll help me to validate the design sketches for the middleware API. I have plenty of crazy-man notes spread around the house, full of boxes and arrows.

See you next month!

You can comment this update on r/rust.

Subscribe to the newsletter if you don't want to miss the next update!
You can also follow the development of pavex on GitHub.

As it happens, I found out a couple of days ago that there might be a way to determine if you derived serde::Deserialize without having to introduce a marker trait. I'll investigate it further in the near future.

If the intersection of neuroscience and developer experience fascinates you, I strongly recommend checking out The Programmer's brain by Felienne Hermans.

Monoliths have a bad reputation, but they can be surprisingly effective in the right circumstances. As an industry, we often think in absolutes—"Monolith? A gigantic spaghetti mess deployed on one big box"—reality is more nuanced. Powered by the right framework, it should be easy enough to deploy a monolithic application as a set of serverless functions, one for each endpoint. As long as they don't call into each other, you retain most of the benefits of a "traditional" monolith without many of its scalability/billing downsides. Food for thought—hybrid deployment strategies are definitely top of mind for me when thinking about pavex's future directions.

Pavex, progress report #3: nested routes and borrow checking

Personal updates

Table of Contents

What's new

Compile-time validation for route parameters

Nesting and encapsulation

Dealing with ambiguity

Borrow checking

Circular dependencies

What's next?