Rust web frameworks have subpar error reporting

None of the major Rust web frameworks have a great error reporting story, according to my personal definition of great.
I've been building production APIs in Rust for almost 6 years now, and I've been teaching people about backend development in Rust for almost as long: I've always had to tweak, work-around or actively fight the framework to get reliable and exhaustive error reporting.

Last year I bit the bullet and started building my own web framework, Pavex. I channeled my frustration into a different error reporting design. This post sums up the journey and the rationale behind it.

You can discuss this post on r/rust.

Table of contents

What are errors for?

So many different things can go wrong in a networked application: the database is down (or slow), the caller sent invalid data, you ran out of file descriptors, etc.
Every time something goes wrong, two different concerns must be addressed: reacting and reporting.

Reacting

Whoever called your API is waiting for a response!
Your application needs to convert the error into a response, using a representation that the caller can understand.

For an HTTP API, this involves selecting the most appropriate status code (e.g. 500 Internal Server Error or 400 Bad Request) and, if required, a more detailed error message in the body (e.g. an explanation of which field was invalid and why).

Reporting

At the same time, as an operator (i.e. the person responsible for keeping the application up and running), you need to have a mechanism to know that an error occurred. For example, you might track the percentage of 5xx errors to page an on-call engineer if it goes above a pre-defined threshold.

Knowing that an error occurred is not enough though: you need to know what went wrong.
When that engineer gets paged, or when you get to work in the morning, there has to be enough information to troubleshoot the issue.

Modelling errors in Rust

Rust has two ways to model failures: panics and Result.
Panics are primarily used for unrecoverable errors, so I won't discuss them much here—you need to recover and send a response! Let's focus on Result instead.

Result is a type, an enum. It has two variants: success (Ok) or failure (Err). When a function can fail, it shows in its signature: it uses a Result as its return type.

There's a lot to be said about good error design as a prerequisite to good error reporting, but that'd be too much of a detour. If you want to learn more about error design, check out this previous post of mine—it builds on the same principles.

The Error trait

There are no constraints on the type of the Err variant, but it's a good practice to use a type that implements the std::error::Error trait.
std::error::Error is the cornerstone of Rust's error handling story. It requires error types to:

The last point is particularly important: error types are often wrappers around lower-level errors.
For example, a database connection error might be caused by a network error, which is in turn caused by a DNS resolution issue. When troubleshooting, you want to be able to drill down into the chain of causes. You can't fix that database connection error if your logs don't show that it was caused by a DNS resolution issue in the first place!

Our benchmark

High-level requirements

Let's set some expectations to properly "benchmark" the error reporting story of different web frameworks.
At a high level, we want the following:

It should be possible to ensure that these requirements are met with minimum room for error—it shouldn't be possible to forget to log an error, or to log it in a way that's inconsistent with the rest of the application.

I consider this the bare minimum telemetry setup for a production-grade application. I don't expect a web framework to provide this experience out of the box (although it'd be nice!), but I do expect it to provide the necessary hooks to build it myself.

Low-level requirements

We can convert this high-level specification into a set of concrete requirements:

  1. For every incoming request, there is an over-arching tracing::Span that captures the entire request lifecycle. I'll refer to this as the root span.
  2. Every time an error occurs, the application emits a tracing event:
    1. Its level is set to ERROR
    2. The Display representation of the error in recorded in the event's error.msg field
    3. The Debug representation of the error in recorded in the event's error.details field
    4. The chain of sources (if any) is recorded in the event's error.source_chain field
  3. For the error that was converted into the HTTP response returned to the caller, we capture:
    1. The Display representation in the root span's error.msg field
    2. The Debug representation in the root span's error.details field
    3. The chain of sources in the root span's error.source_chain field

I've been using tracing as the structured library of choice here, but the same requirements can be expressed in terms of other logging libraries (and the framework should be able to integrate with them!).

Frameworks

I'll start by reviewing how Actix Web and axum, the two most popular web frameworks in the Rust ecosystem, fare against these requirements1. I'll then discuss Pavex's approach.

If you don't care about the details, you can skip to the conclusion to see how the frameworks compare.

axum

In axum, the following components can fail:

axum's overall error handling approach is detailed in their documentation.
I'll focus on request handlers and extractors, as they're the most common error sources in applications.

Request handlers

In axum, request handlers are asynchronous functions that return a type that implements the IntoResponse trait.

IntoResponse

IntoResponse is a conversion trait: it specifies how to convert a type into an HTTP response.

pub trait IntoResponse {
    fn into_response(self) -> Response<Body>;
}

Result implements IntoResponse, as long as both the Ok and Err variants do.

Once IntoResponse::into_response has been called (by the framework), the type is gone—self is consumed. From an error reporting perspective, this means that you can't manipulate the error anymore.

Extractors

Extractors are axum's dependency injection mechanism.
They're used to extract data from the request (e.g. the query string, the request body, etc.) or to reject the request if it doesn't meet certain criteria (e.g. it's missing an Authorization header).

You define an extractor by implementing either the FromRequest or the FromRequestParts traits.

// Slightly simplified for exposition purposes
pub trait FromRequest<S>: Sized {
    /// If the extractor fails it'll use this "rejection" type. A rejection is
    /// a kind of error that can be converted into a response.
    type Rejection: IntoResponse;

    /// Perform the extraction.
    async fn from_request(req: Request, state: &S) -> Result<Self, Self::Rejection>;
}

Error handling works similarly to request handlers: if the extractor fails, it must return an error type that implements IntoResponse. Therefore, it suffers from the same limitations.

Can axum meet our requirements?

axum provides no mechanism to execute logic between the request handler returning an error, and that very same error being converted into an HTTP response via IntoResponse::into_response. The same is true for extractors.
If you want to log errors, you must do it:

Neither is ideal.

You don't have a single place where the logging logic lives2. You end up with log statements spread out across the entire codebase. It's easy for an error to slip through the cracks, unlogged, or for logging logic to evolve inconsistently over time.
Things get worse if you use error types defined in other crates—you can't add logging to their IntoResponse implementation, nor customize it if it's there. Perhaps they are emitting a tracing error event, but they aren't using the same field names or they aren't recording the source chain.

Out of the box, axum comes quite short of meeting the telemetry requirements I laid down. You can try to implement some mitigation strategies, described below, but neither is bullet-proof.

Workaround #1

You can try to wrap3 all your errors with a single custom error type (e.g. ErrorLogger<E>). You then implement IntoResponse for the wrapper and add the logging logic there.
This still isn't a bulletproof solution:

This workaround, even if applied correctly, would still fail to meet all our requirements: from inside IntoResponse you can't access extractors, therefore you have no way to reliably access the root span for the current request and attach error details to it.

Workaround #2

Later edit: this approach was suggested in the r/rust comment section.

The approach above can be refined using Response's extensions.
You still need to wrap all errors with a custom wrapper, but you don't eagerly log the error inside IntoResponse. You instead store4 the error in the extensions attached to the Response. A logging middleware then tries to extract the error type from the extensions to log it.

The middleware can access the root span, coming closer to meeting our requirements.
The underlying challenges remain unresolved: there is no reliable way to ensure you wrapped all errors and you need to wrap all third-party extractors, including those defined in axum itself.

Actix Web

In Actix Web, the following components can fail:

Actix Web's overall error handling approach is detailed on their website.
Just like with axum, I'll focus on request handlers and extractors, as they're the most common error sources in applications.

Request handlers

In Actix Web, request handlers are asynchronous functions that return a type that implements the Responder trait.

Responder

pub trait Responder {
    type Body: MessageBody + 'static;

    // Required method
    fn respond_to(self, req: &HttpRequest) -> HttpResponse<Self::Body>;
}

Responder is a conversion trait: it specifies how to convert a type into an HTTP response. Just like axum's IntoResponse, once Responder::respond_to has been called (by the framework), the type is gone—self is consumed.

Result implements Responder, as long as:

ResponseError

ResponseError is another conversion trait, specialised for errors—it provides a cheap way to check the status code of the resulting response without having to build it wholesale.

pub trait ResponseError: Debug + Display {
    fn status_code(&self) -> StatusCode;
    fn error_response(&self) -> HttpResponse<BoxBody>;
}

Notice one key detail: neither status_code nor error_response consume self. They both take a reference to the error type as input. You might be thinking: "It doesn't matter, Responder::respond_to consumes self anyway, so we can't log the error anymore!"
But here comes the twist: HttpResponse::error

HttpResponse::error

In Actix Web, when an HttpResponse is built from an error (via HttpResponse::from_error), the error is stored as part of the response. You can still access the error after the response has been built!

Extractors

In Actix Web, extractors are types that implement the FromRequest. In terms of error handling, they work similarly to request handlers: if the extractor fails, it must return an error type that can be converted into actix_web::Error which is in turn converted into HttpResponse via its ResponseError implementation.

Can Actix Web meet our requirements?

Almost.
You can write an Actix Web middleware that checks if the current response bundles an error and, if so, log it.
That's exactly what I did in tracing-actix-web.

tracing-actix-web was indeed built to meet the requirements I set at the beginning of this post, but it falls short: only the last error is going to be logged.

You can see why that's the case by following this scenario:

The logging middleware never gets a chance to see the first error since the corresponding response has been thrown away. This is unfortunately a fundamental limitation of Actix Web's current error handling design.

Pavex

Pavex is a new web framework I'm building. It's currently going through a private beta, but you can find the documentation here.

In Pavex, the following components can fail:

You can find a detailed overview of the error handling story in the documentation.

Error requirements

There is only one requirement for errors in Pavex: it must be possible to convert them into a pavex::Error via pavex::Error::new.
All errors that implement the std::error::Error trait can be converted into a pavex::Error, as well as some other types that can't implement it directly—e.g. anyhow::Error or eyre::Report.

IntoResponse

Pavex, just like Actix Web and axum, has a conversion trait that specifies how to convert a type into an HTTP response: IntoResponse.

pub trait IntoResponse {
    // Required method
    fn into_response(self) -> Response;
}

There's a key difference though: IntoResponse is not implemented for Result.

Error handlers

To convert an error into an HTTP response, you must register an error handler.

use pavex::blueprint::router::POST;
use pavex::blueprint::Blueprint;
use pavex::f;

pub fn blueprint() -> Blueprint {
    let mut bp = Blueprint::new();
    // The `handler` for the `/login` route returns a `Result`
    bp.route(POST, "/login", f!(crate::core::handler))
        // We specify which function should be called to
        // convert the error into an HTTP response
        .error_handler(f!(crate::core::login_error2response));
    // [...]
}

An error handler is a function or method that takes a reference to the error type and returns a type that implements IntoResponse.

use pavex::http::StatusCode;

pub async fn login_error2response(e: &LoginError) -> StatusCode  {
    match e {
        LoginError::InvalidCredentials => StatusCode::UNAUTHORIZED,
        LoginError::DatabaseError => StatusCode::INTERNAL_SERVER_ERROR,
    }
}

Error observers

After Pavex has generated an HTTP response from the error, using the error handler you registered, it converts your concrete error type into a pavex::Error and invokes your error observers.

pub async fn log_error(e: &pavex::Error) {
    tracing::error!("An error occurred: {}", e);
}

An error observer is a function or method that takes a reference to pavex::Error as input and returns nothing.
They are designed for error reporting—e.g. you can use them to log errors, increment a metric counter, etc.

You can register as many error observers as you want, and they will all be invoked in the order they were registered:

use pavex::blueprint::router::POST;
use pavex::blueprint::Blueprint;
use pavex::f;

pub fn blueprint() -> Blueprint {
    let mut bp = Blueprint::new();
    bp.error_observer(f!(crate::core::log_error));
    // [...]
}

Can Pavex meet our requirements?

Yes!
Pavex invokes error observers for every error that occurs—by construction, you simply can't forget an error along the way.
Error observers can take advantage of dependency injection, therefore they access the root span for the current request and attach error details to it. That's exactly what happens in the starter project generated by pavex new, using the following error observer:

pub async fn log_error(e: &pavex::Error, root_span: &RootSpan) {
    let source_chain = error_source_chain(e);
    // Emit an error event
    tracing::error!(
        error.msg = %e,
        error.details = ?e,
        error.source_chain = %source_chain,
        "An error occurred during request handling",
    );
    // Attach the error details to the root span
    // If multiple errors occur, the details of the last one will "prevail"
    root_span.record("error.msg", tracing::field::display(e));
    root_span.record("error.details", tracing::field::debug(e));
    root_span.record("error.source_chain", error_source_chain(e));
}

That's all you need to meet the requirements I set at the beginning of this post.
No workarounds, no sharp edges, no corner cases.

Conclusion

It is not possible to fully and reliably satisfy our telemetry requirements with either axum nor Actix Web.
Actix Web comes much closer though: that's why I still recommend Actix Web over axum when people ask me for advice on which Rust web framework to use for their next project. Solid error reporting is that important to me.

Pavex, on the other hand, easily meets all the requirements.
It's not a coincidence: I've been building it with these requirements in mind from day one, making error reporting a first-class concern. I'm confident to say that, right now, Pavex has the best error reporting story in the Rust web ecosystem.

Nonetheless, there is no intrinsic limitation preventing Actix Web or axum from converging to a similar design (or perhaps a new one!) to resolve the issues I've highlighted in this post.
I sincerely hope that happens—the main advantage of having different frameworks is the constant cross-pollination of ideas and the pressure to improve.


You can discuss this post on r/rust.

Subscribe to the newsletter if you want to be notified when a post is published!
You can also follow the development of Pavex on GitHub.


Footnotes

1

I originally wanted to include Rocket in this comparison, but I quickly realised that it doesn't provide enough hooks to even wrap a tracing::Span around the request-handling Future. That's a prerequisite to a correct implementation of structured logging, there's no point in going further without it.

2

If you're using TraceLayer, from tower_http, you might be wondering: isn't that enough? Isn't that the single place? Unfortunately, TraceLayer::on_failure doesn't get to see the error, it only looks at the response generated by the error!

3

There's another variation of this approach: you return the same error type (e.g. ApiError) from all your extractors, request handlers and middlewares. The two approaches are fundamentally equivalent.

4

What's stored inside Extensions has to be clonable. This can be solved by wrapping the original error inside an Arc.

5

There's another issue with failures in Actix Web middlewares, but it'd take forever to get into the details and explain it. The TL;DR is that the invocation of the downstream portion of the middleware stack should return a Response but it now returns a Result, creating a weird separate track for errors that's hard to integrate with the overall error handling story.