Pavex DevLog #5: redesigning our runtime types

June 29, 2023

2898 words

15 min

👋 Hi!
It's Luca here, the author of "Zero to production in Rust".
This is progress report about pavex, a new Rust web framework that I have been working on. It is currently in the early stages of development, working towards its first alpha release.

Check out the announcement post to learn more about the vision!

Overview

It's time for another progress report on Pavex, covering the work done in June!

At a glance:

Pavex's runtime abstractions are taking shape. We have our own RequestHead and Response types, more extractors and better machinery for working with Response bodies (e.g. the TypedBody trait)
I've consolidated the user-facing crates. You no longer have pavex_runtime and pavex_builder as separate dependencies: you just depend on pavex to get everything you need.
Dogfooding the framework (i.e. trying to build something with it) is paying dividends. I've identified a variety of bugs and shortcomings; some have been fixed already, others are planned for the future.

Let's dive in!

You can discuss this update on r/rust.

What's new

Runtime evolution

The runtime side of Pavex was barely stubbed last month—we were directly re-exporting the entirety of http and hyper, with litte to no ceremony.
As I started to implement the Realworld specification using Pavex (more on that later), I refined the runtime abstractions in lockstep.

I'll give you a quick overview of the changes, but keep in mind that things are very fluid at the moment!

No `Request` for Pavex

The first change is that Pavex no longer exposes the Request type from http. Even more radical: Pavex doesn't have a Request type at all!

You can think of an HTTP request as a combination of two parts:

The request head, which contains the HTTP version (e.g. 1.1), the path (e.g. /users), the method (e.g. POST or GET) and the headers.
The request body, which contains the payload of the request (e.g. a JSON document or a protobuf-encoded one).

Many extractors (e.g. RouteParams, QueryParams) only need to look at the request head to do their job.
But convenience is king: most folks (based on my personal experience) default to using the Request type as input to their request handlers and extractors—even if they have no use for the body.

This becomes a problem in the context of Rust's borrow checker. Bundling together data with different usage patterns is a recipe for pain and frustration when working with Rust.

The request body is streamed from the client. In order to process that stream (e.g. buffer it in memory), you need to hold a mutable reference to it—i.e. an exclusive reference (i.e. &mut Body or Body).
Most operations involving the request head, instead, work with a shared reference (i.e. &RequestHead). There is no mutation involved.

If you have a "bundled" Request type, these access patterns clash. You can't have an exclusive reference to the request body while you're holding a shared reference to the entire request in another extractor.
This is not a major problem for other frameworks (e.g. actix-web)—they often force you to clone data in your extractors, therefore the request is only borrowed for the duration of the extraction operation.
That's not the case for Pavex—extractors (and constructors in general) can borrow data from the incoming request which can then be used in your request handler.

As an example, consider the following usage of the QueryParams extractor:

#[derive(Debug, Deserialize)]
struct QueryParams<'a> {
    name: Cow<'a, str>,
}

fn handler(params: QueryParams<'_>, body: BufferedBody) -> Response {
    /* */
}

QueryParams lets you avoid allocations when extracting the name query parameter¹, borrowing data directly from the request.
BufferedBody, instead, takes ownership of the streaming request body and buffers it in memory.

If QueryParams were to take a &Request as input, this wouldn't compile: you can't consume the request body in BufferedBody while you're holding a shared reference to the entire request in QueryParams.

I could "solve" this problem by writing extensive documentation and explaining to people that they're holding it wrong. Or I could just remove the problem altogether: no Request, no problem! If you're writing an extractor that needs access to both the head and the body, you can just ask Pavex to give you both as inputs (as we just did in the handler function above).

Our own `Response` type

The second change is that Pavex no longer exposes the Response type from the http crate.
The choice was driven by two factors:

Extensions
API control

Extensions considered harmful

The Response type from the http crate includes an Extensions field.
Most Rust web frameworks follow the same design, even if they don't necessarily rely on the http crate: actix-web has HttpResponse::extensions, tide has Response::insert_ext, etc.

You can think of extensions as a side-channel: it lets you pass arbitrary data from your request handler to your middlewares (or between middlewares). It is backed by a typemap—i.e. a HashMap where the key is a type id and the value is a Box<dyn Any>.

Extensions are powerful, but they are also a major source of pain and bugs.
When you try to retrieve data, you need to downcast it to the right type. The compiler can't help you if you try to retrieve a value with the wrong type or that doesn't exist at all: you'll find out at runtime.

It's spooky action at a distance: different parts of your codebase (and its dependencies!) must be kept in sync to ensure everything works smoothly, but the relationship is entirely implicit.
You can't tell from the signature of a function whether (or what!) it will read or write to the extensions. You can't know if a certain middleware requires another middleware to be present in the chain in order to insert a certain value in the response extensions. You also can't know if a middleware will end up overwriting an extension value that you're using or inserting in your request handler.

Pavex works differently: no first-class side-channels.
All the data that you want to use or return in a constructor or in a middleware must be explicitly declared in their signatures. This allows Pavex to reason about the data flow in our application at compile-time, flagging issues early and providing helpful errors.
We have no use for extensions on our Response type.

API control

Relying entirely on http::Response is also a major limitation when it comes to API control.
We can't add new methods to it, we can't change the signature of existing methods, etc.
For a framework, this is a major limitation.

By introducing our own Response type, we can control the API surface and evolve it over time.

Preserving interoperability

http is a foundational crate. Its types are used by many other crates in the ecosystem as the "default" HTTP representations.
Moving away from it introduces an interoperability problem.

Luckily enough, I agree with most of the design choices made by http's authors when it comes to HTTP responses. Pavex's Response is just a newtype wrapper around http::Response, to enforce what we discussed above.

pub struct Response<Body = BoxBody> {
    inner: http::Response<Body>,
}

You can convert (for free) a pavex::Response to an http::Response and vice-versa. Interoperability is preserved.

`TypedBody` and `IntoResponse`

Drunk with the newly found power of having our own Response type, I started to experiment with the machinery to build responses.

Request handlers in Pavex must return a type that implements the IntoResponse trait². This design is very similar (and inspired by) actix-web's Responder trait and axum's IntoResponse trait.
Both actix-web and axum let you return a response body (e.g. a Json type) and automatically convert it into a Response for you, using the correct Content-Type header.
They go one step further though: they make an assumption about the status code of the response. A 200 OK is a reasonable default, but it is not always the right choice. What about a newly created resource, with its 201 Created? Or a 204 No Content?

The more I looked at it, the more I feel that we are conflating two different concerns:

the handling of typed response bodies (and their Content-Header)
the value of the corresponding response head (i.e. status code and headers).

I decided to split them apart.

Pavex will be very conservative with IntoResponse. It will only be implemented for a few types, the ones that don't require us to make assumptions about the status code—e.g. ResponseHead, StatusCode itself.
A new trait, TypedBody, will instead encapsulate the machinery for converting a (typed) response body into its representation on the wire:

pub trait TypedBody {
    type Body: RawBody<Data = Bytes> + Send + Sync + 'static;

    /// The header value that should be used as `Content-Type` when
    /// returning this `Response`
    fn content_type(&self) -> HeaderValue;

    /// The actual body type travelling on the wire.
    ///
    /// It must implement the `RawBody` trait.
    fn body(self) -> Self::Body;
}

You can implement TypedBody for your own types, or use the provided implementations for String, Vec<u8>, Bytes, &'static str, Json, Html, etc.

The implementation is quite straightforward—let's look at the one for Html as an example:

use pavex::http::HeaderValue;
use pavex::response::body::raw::{Full, Bytes};
use mime::TEXT_HTML_UTF_8;

/// A `Response` body with `Content-Type` set to
/// `text/html; charset=utf-8`.
pub struct Html(Bytes);

impl TypedBody for Html {
    type Body = Full<Bytes>;

    fn content_type(&self) -> HeaderValue {
        HeaderValue::from_static(TEXT_HTML_UTF_8.as_ref())
    }

    fn body(self) -> Self::Body {
        // The response is fully buffered in memory, therefore
        // we wrap the corresponding bytes in a `Full` body.
        Full::new(self.0)
    }
}

Typed bodies can then be passed to the corresponding method on the Response type:

let html: Html = "<h1>Hello world!</h1>".into();
Response::ok().typed_body(html)

Since Response provides a shorthand constructor for all status codes the resulting code remains quite compact.
As an added bonus, this drives further standardisation in the signatures of request handlers: they will almost always be returning a Response. One less thing to worry about when getting started.

One `pavex` crate to rule them all

Up until now, a user of Pavex had to interact with two different crates:

pavex_builder, which contained the Blueprint type and all the machinery required to define the specification (routes, request handlers, constructors) for your API.
pavex_runtime, which contained the types and traits needed to write request handlers and middlewares (request, responses, headers, extractors, etc).

Having two separate crates brings some benefits, mostly around compile times:

Pay for what you use. If you only need one of them, you don't need to compile the other one.
Parallel builds. When you need them both, cargo can still compile them in parallel since they don't depend on each other (check out cargo build --timings for your own projects, if you haven't done it yet!)

It is a trade-off though.
The surface-level complexity exposed to the user is higher—they need to understand how the two crates interact with each other. If the crates are versioned independently, users also need to figure out which versions are compatible.

When looking at Pavex, the benefits didn't actually materialise in practice.
An application built with Pavex has to define its request handlers (therefore depend on pavex_runtime, either directly or transitively) and then register them against a Blueprint (therefore depend on pavex_builder). You always pay for everything.

The dependency graph is about to change as well: pavex_runtime will soon take a dependency on pavex_builder. I plan to introduce presets, along the same lines of the api and browser middleware groups in Laravel; a set of "core" constructors and middlewares that you often want to use in your application to boost your productivity.
When presets ship, we can say goodbye to build parallelism as well.

Given the above, I bit the bullet and merged the two crates into a single pavex crate.

I haven't given up on compile-time optimisation though!
I plan to recover the "pay for what you use" aspect by using feature flags to control which parts of the crate are compiled. It won't be useful to applications, but it might make a difference to custom tooling built on top of Pavex (e.g. a GUI to inspect your Blueprint doesn't need the runtime types).
I am also exploring the viability of recovering (some) parallel compilation by having a "facade" crate (pavex) which re-exports from multiple (largely independent) sub-crates. It is somewhat involved, especially when it comes to documentation, but it want to explore it further in the future.

The Realworld specification

Nothing gives you a better feeling for a framework than trying to build something with it.

As anticipated in the previous report, I started to implement the Realworld specification using Pavex.
The Realworld specification is a set of requirements for a blogging platform (named Conduit), with a reference implementation in many different languages and frameworks. It's a great way to get a feeling for a framework and its ergonomics.

The Pavex implementation is not complete yet, but you can already check out the source code if you are curious.

Most of the runtime decisions that I discussed above were driven by my own observations while trying to implement the spec.
As it happens, I also identified a variety of bugs and missing features as I was going along. I won't go into details (this update is long enough already!) but you can have a look at the PRs if you are curious:

🎉 QueryParams extractor (#65)
🎉 Json and BufferedBody extractors, with default body size limits (#66, #67)
🎉 Get syntax highlighting and go-to-definition working inside the f! macro (commit)
🛠️ Fix validation for route paths to ensure expressivity while enforcing a canonical representation (#70)
🛠 Handle blueprints with multiple levels of nesting (#68)
🛠️ [Reflection] Handle type aliases when working with methods (#79)
🛠️ [Reflection] Handle standalone re-exported items (#79)
🛠️ [Reflection] Tweak deserialization limits for the Rustdoc JSON docs of pathological crates like typenum (#79)
🛠 [Reflection] Handle glob re-exports from local modules (#73)
🛠 [Reflection] Handle generic arguments defined in a dependency (commit)

Yes, building a compile-time reflection engine on top of Rustdoc JSON docs is a nasty business.

What's next

I plan to continue (and complete?) the Realworld implementation in July.
There's work to be done with respect to our error handling story and I really need to add support for middlewares—debugging without a logging middleware is a pain!

Once the above are in place, I can start working on docs: refining the API reference, writing tutorials and conceptual guides. It's going to be a lot of work, but it's absolutely vital.

July is also going to be a busy month for my personal life.
I have to work my way through a ton of paperwork to finalise my relocation to Italy. Not fun at all, believe me.
I'll also be starting a new job as a Principal Engineering Consultant at Mainmatter. I'll be partnerning with teams across the world who are looking to adopt Rust or scale its usage (check out this post if you want know to more about it).

See you next month!

You can discuss this update on r/rust.

Subscribe to the newsletter if you don't want to miss the next update!
You can also follow the development of Pavex on GitHub.

You must use Cow<'a, str> rather than &str because allocations sometimes cannot be avoided. In the case of query parameters, you are forced to allocate a fresh String if the raw query parameter contains any URL-encoded symbols.

Or a Result whose Ok variant implements the IntoResponse trait. We don't expect errors to implement IntoResponse: you need to register a dedicated error handler to convert them to a response, with the advantage of being able to customise the rendered response for errors defined in other crates (e.g. Pavex's extractor errors) and take advantage of dependency injection when building the response.