A taste of pavex, an upcoming Rust web framework

December 24, 2022

2615 words

14 min

TL;DR

Earlier this year, I started working on a new web framework for Rust: pavex.

The goal is simple: great ergonomics and high performance - no sacrifices.
As easy to use as tide, Rails or ASP.NET Core.
As fast as a handwritten solution built directly on top of raw hyper.

I've been working on it for a few months now, enough to prove feasibility. It is not yet ready for user testing, but the design has solidified enough to start talking about it.
This post is an opportunity to open my workshop and share the vision: I want to understand if it resonates with the broader Rust web community.

The state of Rust for the web

actix-web, rocket, axum, tide, warp - we have plenty of web frameworks in the Rust ecosystem, even limiting the list to the most popular ones. Some would say we have too many - and here I am, working on making that list even longer.

Why pavex? Why would you go and build yet another web framework?

To broaden the design space!
I believe there is an under-explored opportunity to significantly improve the developer experience of Rust web developers by raising the level of abstraction of their tools.

The current generation of Rust web frameworks is trying to walk a tight rope.
On one side, they strive to provide ergonomic APIs, lowering the bar for more and more people to get started building APIs in Rust.
On the other side, they want to provide high-performance and (wherever possible) misuse-resistant interfaces with compile-time guarantees of correctness.

There is tension between those two objectives.
High-performance and compile-safety drives frameworks to lean heavily on the expressiveness of Rust's type systems, trying to encode invariants for compile-time verification as well as limiting the overhead of the framework itself.
This all works just fine on the happy path, but it can lead to obscure compiler errors on the unhappy path - often too obscure for beginners trying to make sense of what is happening.

Let's look at axum's "Hello world" example as a case study:

use axum::{response::Html, routing::get, Router};
use std::net::SocketAddr;

#[tokio::main]
async fn main() {
    let app = Router::new().route("/", get(handler));
    axum::Server::bind(&SocketAddr::from(([127, 0, 0, 1], 3000)))
        .serve(app.into_make_service())
        .await
        .unwrap();
}

async fn handler() -> Html<&'static str> {
    Html("<h1>Hello, World!</h1>")
}

axum requires all handler functions to be asynchronous. What happens if you forget?

// [...]
// No longer `async`!
fn handler() -> Html<&'static str> { /* */ }

The compiler greets us with this error message:

error[E0277]: the trait bound `fn() -> Html<&'static str> {handler}: Handler<_, _, _>` is not satisfied
   --> hello-world/src/main.rs:12:44
    |
12  |     let app = Router::new().route("/", get(handler));
    |                                        --- ^^^^^^^ 
                                             |   the trait `Handler<_, _, _>` is not implemented 
                                             |   for `fn() -> Html<&'static str> {handler}`
    |                                        |
    |                                        required by a bound introduced by this call
    |
    = help: the trait `Handler<T, S, B>` is implemented for `Layered<L, H, T, S, B>`
note: required by a bound in `axum::routing::get`
   --> /Users/luca/code/axum/axum/src/routing/method_routing.rs:400:1
    |
400 | top_level_handler_fn!(get, GET);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `axum::routing::get`
    = note: this error originates in the macro `top_level_handler_fn`

Good luck figuring that out!
Especially if you are at the beginning of your journey with Rust.

Is the situation helpless? Are we forced to choose between performance, compile-time safety and ergonomics?

No, we are not.
There are ongoing efforts inside the Rust project to empower crate authors to "suggest" to the compiler appropriate error messages in specific situations (check out this PR in bevy!), thus improving the quality of diagnostics.

At the same time, crate authors are trying to step in with the tools currently available in Rust's latest stable release: metaprogramming.
axum provides a #[debug_handler] procedural macro that can be used to annotate request handlers. It has no effect on runtime behaviour, but it allows axum to preempt the compiler and emit its own compiler errors when it detects certain patterns of incorrect behaviour.

Let's use the "Hello world" example again to try it out:

// [...]
// Now annotated with #[debug_handler], same sync signature otherwise
#[axum::debug_handler]
fn handler() -> Html<&'static str> { /* */ }

The error message is now much better:

error: handlers must be async functions
  --> main.rs:xx:1
   |
xx | fn handler() -> &'static str {
   | ^^

This is amazing, because it is speaking at the right level of abstraction.
It is talking about handlers, a concept that we understand as API developers. A concept that we had just tried to use.
No type noise, no need to look at the open guts of axum's inner abstractions.

This is where the idea for pavex comes from.
What if we took this metaprogramming approach to the next level?
Let's get rid of most of the user-facing complexity - heavily generic APIs, intricate trait bounds, nested type chains. We will use straight-forward Rust to specify what your application should do.
We will then feed our app specification to pavex_cli, a transpiler designed specifically for web applications.
It will generate the source code for our web server, using (once again) straight-forward Rust. If something goes wrong, it will return meaningful errors that speak the language of web applications.

You might be wondering - is it even feasible? You'd be right to doubt!
There are significant technical challenges in building a tool that lives up to the vision laid out above.

An overview of `pavex`

Before diving into the technical details, let's get a feeling for the type of API that pavex will expose.

use pavex_builder::{f, AppBlueprint, Lifecycle};

/// Return the blueprint for our application.
pub fn blueprint() -> AppBlueprint {
    let mut bp = AppBlueprint::new();
    
    bp.constructor(f!(crate::http_client), Lifecycle::Singleton);
    bp.constructor(f!(crate::extract_path), Lifecycle::RequestScoped);
    bp.constructor(f!(crate::logger), Lifecycle::Transient);
    
    let vault = bp.route("/vault");
    vault.get(f!(crate::stream_file));
    
    bp
}

What is going on here?

`AppBlueprint`, the compile-time representation

We have an HTTP router, which should look familiar:

let vault = bp.route("/vault");
// Use `stream_file` to handle `GET` requests to `/vault`.
vault.get(f!(crate::stream_file));

We are also registering constructors:

bp.constructor(f!(crate::http_client), Lifecycle::Singleton);
bp.constructor(f!(crate::extract_path), Lifecycle::RequestScoped);
bp.constructor(f!(crate::logger), Lifecycle::Transient);

This pattern is not very common in the Rust ecosystem, therefore it might not be as familiar: pavex performs (compile-time) dependency injection.
Instead of providing the framework with an instance of the types you want to use, you provide it with a function that builds those types. The framework will then call your function to build the instances it needs and pass them to your handlers and middlewares, whenever they are needed.

Our example has a single request handler, stream_file. This is its signature:

pub async fn stream_file(
    request: Request<Body>, 
    filepath: PathBuf, 
    logger: Logger, 
    http_client: HttpClient
) -> Response { 
    /* */ 
}

Whenever a GET /vault request is received, the framework must invoke stream_file.
pavex looks at stream_file's signature (we'll see later how) and figures out that it must provide the handler with an instance of PathBuf, Request<Body>, Logger and HttpClient - the dependencies of our request handler.

Request<Body> is the incoming HTTP request, it will be injected by pavex. For the other three dependencies, pavex looks for their constructors:

pub async fn extract_path(request: &Request<Body>, logger: Logger) -> PathBuf { /* */ }

pub fn logger() -> Logger { /* */ }

pub fn http_client(configuration: Config) -> HttpClient { /* */ }

Constructors themselves can have dependencies, which are resolved in the same way.
pavex, though, needs to know more than just the constructor for a type: it also needs to know how often it should be invoked. This is what we call lifecycle:

Lifecycle	Description	Example
`Singleton`	The instance is created once, when the application starts. It is then reused for all incoming requests. Singletons are your application state.	Database connection pools, HTTP connection pools, configuration.
`RequestScoped`	The instance is created once per request. It is then reused for all processing within the same request.	Path parameters, auth information, parsed request body.
`Transient`	The instance is created every time it is needed.	Database connections, logger instances.

Routes, constructors and their lifecycles are combined into a single AppBlueprint instance: this is a full specification of your application.
This representation only exists at compile-time. It is never used at runtime.
pavex_cli transpiles this blueprint into a runtime crate - a runnable web server.

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // The blueprint we defined above.
    let bp = blueprint();
    pavex_cli::generate(bp)
        // The folder where the generated crate will live.
        .output_path("runtime")
        .execute()?;
    Ok(())
}

The runtime code

The entrypoint

The main entrypoint of the generated crate is the run function:

pub async fn run(
    server_builder: hyper::server::Builder<AddrIncoming>,
    application_state: ApplicationState,
) -> Result<(), pavex_runtime::Error> { /* */ }

run launches the web server so that you can start accepting requests. It takes as input the HTTP server configuration (hyper::server::Builder) and an instance of the application state.

Application state

pub struct ApplicationState {
    s0: app::HttpClient,
}

impl ApplicationState {
    pub fn new(v0: app::Config) -> ApplicationState {
        let v1 = app::http_client(v0);
        ApplicationState { s0: v1 }
    }
}

You can build an instance of ApplicationState by calling ApplicationState::new.
The only singleton needed at runtime is HttpClient, but that's not what ApplicationState::new asks for as input: it wants an app::Config instance. Why? It's due to the constructor we registered for HttpClient:

pub fn http_client(configuration: Config) -> HttpClient { /* */ }

It wants a Config instance as input, but we didn't register any constructor for Config in our blueprint. Therefore pavex generates an ApplicationState::new function that takes Config as input and builds the rest of the state for us.

Request handlers

pavex generates a function for each request handler in our blueprint.
We can see the signature of the GET /vault handler:

pub async fn get_vault(v0: app::HttpClient, v1: Request<Body>) -> Response {
    let v3 = app::extract_path(&v1, app::logger()).await;
    app::stream_file(v1, v3, app::logger(), v0).await
}

The handler takes as input the HttpClient singleton (out of ApplicationState) and the incoming HTTP request.
We can see how pavex walked the dependency graph of our request handler, honoring the lifecycle of each type:

HttpClient is a singleton, therefore it comes from the ApplicationState and it is passed as input to the handler;
Request<Body> is request-scoped, the same instance is passed to both extract_path (as a reference) and stream_file (by value);
Logger is transient, a new instance is created every time it is needed as input.

A zero-cost abstraction

The generated code contains no indirection - no runtime reflection (as it's often the case for dependency injection frameworks other languages), no dynamic dispatch, no type-maps.
It's just a bunch of functions that call each other, passing around the instances they need.
It's a zero-cost abstraction: you would get the same performance if you had written the code by hand.

That's the whole value proposition of pavex: ergonomics and performance.

The unhappy path

I've mostly shown you the happy path - you throw a valid AppBlueprint at pavex and you get back a runnable web server.
But what happens if you make a mistake?

This is, perhaps surprisingly, the aspect of pavex that I'm most excited about: it's a compiler operating at a higher level of abstraction than rustc, therefore we can provide better error messages for our target usecase.

Let's see what happens if try to register a singleton that is not Send:

// Definitely not `Send`, given that it contains an `Rc`.
pub struct NonSendSingleton(Rc<()>);

pub fn blueprint() -> AppBlueprint {
    let mut bp = AppBlueprint::new();
    bp.constructor(f!(crate::NonSendSingleton::new), Lifecycle::Singleton);
    // [...]
}

pavex_cli returns an error when trying to generate the runtime code:

Error: 
  × `app::NonSendSingleton` does not implement the `core::marker::Send` trait.
    ╭─[src/lib.rs:24:1]
 24 │     let mut bp = AppBlueprint::new();
 25 │     bp.constructor(f!(crate::NonSendSingleton::new), Lifecycle::Singleton);
    ·                    ────────────────┬───────────────
    ·                                    ╰── The constructor was registered here
 26 │     
    ╰────
  help: All singletons must implement the `Send` trait.
        `pavex` runs on a multi-threaded HTTP server and singletons must be
        shared across all worker threads.

The error message, going back to the beginning of the post, is at the right level of abstraction. It speaks of singletons and HTTP servers, the concepts that we are working with.
It does not only say that NonSendSingleton does not implement Send, but it also explains why this is a problem.

At the same time, there is more work to be done: the error message does not explain why NonSendSingleton does not implement Send. I'm currently exploring how to leverage rustc to combine both sources of information for an optimal experience.

A peek under the hood

You might be wondering, at this point: how does pavex actually work?

Dependency-injection frameworks are usually implemented via runtime reflection: you can go to the language runtime with a function pointer and introspect it - input types, output type, asyncness, etc.
Rust has no reflection API.

Well, no runtime reflection API.
We live in exciting times. Rust has started to grow a compile-time reflection API and pavex is built on top of it: rustdoc's JSON output.
It's exactly what it sounds like: a JSON version of the crate documentation that you can find on docs.rs, currently only available on nightly. You can generate it for a crate by running:

cargo +nightly rustdoc -p {crate_name} --lib -- -Zunstable-options -wjson

You get as output a JSON file with a structured representation of all the types in your crate.
This is exactly what pavex does: for each registered route handler and constructor, it builds the documentation for the crate it belongs to and extracts the relevant bits of information from rustdoc's output.

You might have noticed the f! macro in the examples above:

bp.constructor(f!(crate::http_client), Lifecycle::Singleton);

f!(crate::http_client) desugars to

pavex_builder::RawCallable {
    import_path: "crate::http_client",
    callable: crate::http_client,
}

which is further transformed by the AppBlueprint into a pavex_builder::RawCallableIdentifiers:

pavex_builder::RawCallableIdentifiers {
    // Name of the crate where `blueprint.constructor(..)` was called
    registered_at: "app",
    // Stringified fully-qualified path to the function
    import_path: "crate::http_client",
}

This is the information fed to pavex as input, via AppBlueprint.

For each identifier, pavex looks at the Cargo.lock for the current workspace, finds the registered_at crate, determines its dependencies and generates the documentation for the crate where the callable is defined (in the example above, app itself, since import_path begins with crate).
It then walks the JSON documentation to find the signature of http_client and repeats the process for all its inputs and outputs.

All this information is assembled in a series of call graphs, one for each registered route handler and one for the application state.

*The call graph for `GET /vault`, the route in our example.*

The call graphs are then used to drive the code generation.

The future

I hope I showed you enough to be intrigued.

Back to reality though: pavex is still in its early days, not yet ready for user testing. I have intentionally avoided publishing the crates on crates.io, the API is too experimental and it would be detrimental to encourage its usage at this stage.

There is a lot to work on! Error handling, middleware(s), robust diagnostics, examples, project funding, etc.
If all goes according to plan, I expect to have a first alpha version ready by July 2023.

In the meantime, you can follow the development on the GitHub repository.
It's going to be a fun ride!