Rust web frameworks have subpar error reporting

February 05, 2024

3666 words

19 min

None of the major Rust web frameworks have a great error reporting story, according to my personal definition of great.
I've been building production APIs in Rust for almost 6 years now, and I've been teaching people about backend development in Rust for almost as long: I've always had to tweak, work-around or actively fight the framework to get reliable and exhaustive error reporting.

Last year I bit the bullet and started building my own web framework, Pavex. I channeled my frustration into a different error reporting design. This post sums up the journey and the rationale behind it.

You can discuss this post on r/rust.

What are errors for?

So many different things can go wrong in a networked application: the database is down (or slow), the caller sent invalid data, you ran out of file descriptors, etc.
Every time something goes wrong, two different concerns must be addressed: reacting and reporting.

Reacting

Whoever called your API is waiting for a response!
Your application needs to convert the error into a response, using a representation that the caller can understand.

For an HTTP API, this involves selecting the most appropriate status code (e.g. 500 Internal Server Error or 400 Bad Request) and, if required, a more detailed error message in the body (e.g. an explanation of which field was invalid and why).

Reporting

At the same time, as an operator (i.e. the person responsible for keeping the application up and running), you need to have a mechanism to know that an error occurred. For example, you might track the percentage of 5xx errors to page an on-call engineer if it goes above a pre-defined threshold.

Knowing that an error occurred is not enough though: you need to know what went wrong.
When that engineer gets paged, or when you get to work in the morning, there has to be enough information to troubleshoot the issue.

Modelling errors in Rust

Rust has two ways to model failures: panics and Result.
Panics are primarily used for unrecoverable errors, so I won't discuss them much here—you need to recover and send a response! Let's focus on Result instead.

Result is a type, an enum. It has two variants: success (Ok) or failure (Err). When a function can fail, it shows in its signature: it uses a Result as its return type.

There's a lot to be said about good error design as a prerequisite to good error reporting, but that'd be too much of a detour. If you want to learn more about error design, check out this previous post of mine—it builds on the same principles.

The `Error` trait

There are no constraints on the type of the Err variant, but it's a good practice to use a type that implements the std::error::Error trait.
std::error::Error is the cornerstone of Rust's error handling story. It requires error types to:

Implement the Display trait, as its user-facing representation
Implement the Debug trait, as its operator-facing representation
Provide a way to access the source of the error, if any

The last point is particularly important: error types are often wrappers around lower-level errors.
For example, a database connection error might be caused by a network error, which is in turn caused by a DNS resolution issue. When troubleshooting, you want to be able to drill down into the chain of causes. You can't fix that database connection error if your logs don't show that it was caused by a DNS resolution issue in the first place!

Our benchmark

High-level requirements

Let's set some expectations to properly "benchmark" the error reporting story of different web frameworks.
At a high level, we want the following:

All errors are logged, exactly once, with enough information to troubleshoot
With a single log line, we can tell:
- If the request failed
- What error occurred
- What caused the error

It should be possible to ensure that these requirements are met with minimum room for error—it shouldn't be possible to forget to log an error, or to log it in a way that's inconsistent with the rest of the application.

I consider this the bare minimum telemetry setup for a production-grade application. I don't expect a web framework to provide this experience out of the box (although it'd be nice!), but I do expect it to provide the necessary hooks to build it myself.

Low-level requirements

We can convert this high-level specification into a set of concrete requirements:

For every incoming request, there is an over-arching tracing::Span that captures the entire request lifecycle. I'll refer to this as the root span.
Every time an error occurs, the application emits a tracing event:
1. Its level is set to ERROR
2. The Display representation of the error in recorded in the event's error.msg field
3. The Debug representation of the error in recorded in the event's error.details field
4. The chain of sources (if any) is recorded in the event's error.source_chain field
For the error that was converted into the HTTP response returned to the caller, we capture:
1. The Display representation in the root span's error.msg field
2. The Debug representation in the root span's error.details field
3. The chain of sources in the root span's error.source_chain field

I've been using tracing as the structured library of choice here, but the same requirements can be expressed in terms of other logging libraries (and the framework should be able to integrate with them!).

Frameworks

I'll start by reviewing how Actix Web and axum, the two most popular web frameworks in the Rust ecosystem, fare against these requirements¹. I'll then discuss Pavex's approach.

If you don't care about the details, you can skip to the conclusion to see how the frameworks compare.

`axum`

In axum, the following components can fail:

Request handlers
Extractors
Middlewares/arbitrary tower services

axum's overall error handling approach is detailed in their documentation.
I'll focus on request handlers and extractors, as they're the most common error sources in applications.

Request handlers

In axum, request handlers are asynchronous functions that return a type that implements the IntoResponse trait.

`IntoResponse`

IntoResponse is a conversion trait: it specifies how to convert a type into an HTTP response.

pub trait IntoResponse {
    fn into_response(self) -> Response<Body>;
}

Result implements IntoResponse, as long as both the Ok and Err variants do.

Once IntoResponse::into_response has been called (by the framework), the type is gone—self is consumed. From an error reporting perspective, this means that you can't manipulate the error anymore.

Extractors

Extractors are axum's dependency injection mechanism.
They're used to extract data from the request (e.g. the query string, the request body, etc.) or to reject the request if it doesn't meet certain criteria (e.g. it's missing an Authorization header).

You define an extractor by implementing either the FromRequest or the FromRequestParts traits.

// Slightly simplified for exposition purposes
pub trait FromRequest<S>: Sized {
    /// If the extractor fails it'll use this "rejection" type. A rejection is
    /// a kind of error that can be converted into a response.
    type Rejection: IntoResponse;

    /// Perform the extraction.
    async fn from_request(req: Request, state: &S) -> Result<Self, Self::Rejection>;
}

Error handling works similarly to request handlers: if the extractor fails, it must return an error type that implements IntoResponse. Therefore, it suffers from the same limitations.

Can `axum` meet our requirements?

axum provides no mechanism to execute logic between the request handler returning an error, and that very same error being converted into an HTTP response via IntoResponse::into_response. The same is true for extractors.
If you want to log errors, you must do it:

In your request handler/extractor
Inside the IntoResponse implementation

Neither is ideal.

You don't have a single place where the logging logic lives². You end up with log statements spread out across the entire codebase. It's easy for an error to slip through the cracks, unlogged, or for logging logic to evolve inconsistently over time.
Things get worse if you use error types defined in other crates—you can't add logging to their IntoResponse implementation, nor customize it if it's there. Perhaps they are emitting a tracing error event, but they aren't using the same field names or they aren't recording the source chain.

Out of the box, axum comes quite short of meeting the telemetry requirements I laid down. You can try to implement some mitigation strategies, described below, but neither is bullet-proof.

Workaround #1

You can try to wrap³ all your errors with a single custom error type (e.g. ErrorLogger<E>). You then implement IntoResponse for the wrapper and add the logging logic there.
This still isn't a bulletproof solution:

You may forget to wrap one of your errors with the custom error wrapper.
You can no longer use extractors defined in other crates (including axum itself!). You need to wrap all third-party extractors to ensure they return a wrapped error.

This workaround, even if applied correctly, would still fail to meet all our requirements: from inside IntoResponse you can't access extractors, therefore you have no way to reliably access the root span for the current request and attach error details to it.

Workaround #2

Later edit: this approach was suggested in the r/rust comment section.

The approach above can be refined using Response's extensions.
You still need to wrap all errors with a custom wrapper, but you don't eagerly log the error inside IntoResponse. You instead store⁴ the error in the extensions attached to the Response. A logging middleware then tries to extract the error type from the extensions to log it.

The middleware can access the root span, coming closer to meeting our requirements.
The underlying challenges remain unresolved: there is no reliable way to ensure you wrapped all errors and you need to wrap all third-party extractors, including those defined in axum itself.

Actix Web

In Actix Web, the following components can fail:

Request handlers
Extractors
Middlewares/arbitrary Service implementations

Actix Web's overall error handling approach is detailed on their website.
Just like with axum, I'll focus on request handlers and extractors, as they're the most common error sources in applications.

Request handlers

In Actix Web, request handlers are asynchronous functions that return a type that implements the Responder trait.

`Responder`

pub trait Responder {
    type Body: MessageBody + 'static;

    // Required method
    fn respond_to(self, req: &HttpRequest) -> HttpResponse<Self::Body>;
}

Responder is a conversion trait: it specifies how to convert a type into an HTTP response. Just like axum's IntoResponse, once Responder::respond_to has been called (by the framework), the type is gone—self is consumed.

Result implements Responder, as long as:

the Ok variant implements Responder
the Err variant implements the ResponseError trait

`ResponseError`

ResponseError is another conversion trait, specialised for errors—it provides a cheap way to check the status code of the resulting response without having to build it wholesale.

pub trait ResponseError: Debug + Display {
    fn status_code(&self) -> StatusCode;
    fn error_response(&self) -> HttpResponse<BoxBody>;
}

Notice one key detail: neither status_code nor error_response consume self. They both take a reference to the error type as input. You might be thinking: "It doesn't matter, Responder::respond_to consumes self anyway, so we can't log the error anymore!"
But here comes the twist: HttpResponse::error

`HttpResponse::error`

In Actix Web, when an HttpResponse is built from an error (via HttpResponse::from_error), the error is stored as part of the response. You can still access the error after the response has been built!

Extractors

In Actix Web, extractors are types that implement the FromRequest. In terms of error handling, they work similarly to request handlers: if the extractor fails, it must return an error type that can be converted into actix_web::Error which is in turn converted into HttpResponse via its ResponseError implementation.

Can Actix Web meet our requirements?

Almost.
You can write an Actix Web middleware that checks if the current response bundles an error and, if so, log it.
That's exactly what I did in tracing-actix-web.

tracing-actix-web was indeed built to meet the requirements I set at the beginning of this post, but it falls short: only the last error is going to be logged.

You can see why that's the case by following this scenario:

A request handler returns an error
The error is converted into an HTTP response and stored in the response
The response passes through an unrelated middleware, which fails⁵ and builds a new response from the new error
The logging middleware sees the final response and logs the last error

The logging middleware never gets a chance to see the first error since the corresponding response has been thrown away. This is unfortunately a fundamental limitation of Actix Web's current error handling design.

Pavex

Pavex is a new web framework I'm building. It's currently going through a private beta, but you can find the documentation here.

In Pavex, the following components can fail:

Request handlers
Constructors (i.e. our equivalent of extractors)
Middlewares

You can find a detailed overview of the error handling story in the documentation.

Error requirements

There is only one requirement for errors in Pavex: it must be possible to convert them into a pavex::Error via pavex::Error::new.
All errors that implement the std::error::Error trait can be converted into a pavex::Error, as well as some other types that can't implement it directly—e.g. anyhow::Error or eyre::Report.

`IntoResponse`

Pavex, just like Actix Web and axum, has a conversion trait that specifies how to convert a type into an HTTP response: IntoResponse.

pub trait IntoResponse {
    // Required method
    fn into_response(self) -> Response;
}

There's a key difference though: IntoResponse is not implemented for Result.

Error handlers

To convert an error into an HTTP response, you must register an error handler.

use pavex::blueprint::router::POST;
use pavex::blueprint::Blueprint;
use pavex::f;

pub fn blueprint() -> Blueprint {
    let mut bp = Blueprint::new();
    // The `handler` for the `/login` route returns a `Result`
    bp.route(POST, "/login", f!(crate::core::handler))
        // We specify which function should be called to
        // convert the error into an HTTP response
        .error_handler(f!(crate::core::login_error2response));
    // [...]
}

An error handler is a function or method that takes a reference to the error type and returns a type that implements IntoResponse.

use pavex::http::StatusCode;

pub async fn login_error2response(e: &LoginError) -> StatusCode  {
    match e {
        LoginError::InvalidCredentials => StatusCode::UNAUTHORIZED,
        LoginError::DatabaseError => StatusCode::INTERNAL_SERVER_ERROR,
    }
}

Error observers

After Pavex has generated an HTTP response from the error, using the error handler you registered, it converts your concrete error type into a pavex::Error and invokes your error observers.

pub async fn log_error(e: &pavex::Error) {
    tracing::error!("An error occurred: {}", e);
}

An error observer is a function or method that takes a reference to pavex::Error as input and returns nothing.
They are designed for error reporting—e.g. you can use them to log errors, increment a metric counter, etc.

You can register as many error observers as you want, and they will all be invoked in the order they were registered:

use pavex::blueprint::router::POST;
use pavex::blueprint::Blueprint;
use pavex::f;

pub fn blueprint() -> Blueprint {
    let mut bp = Blueprint::new();
    bp.error_observer(f!(crate::core::log_error));
    // [...]
}

Can Pavex meet our requirements?

Yes!
Pavex invokes error observers for every error that occurs—by construction, you simply can't forget an error along the way.
Error observers can take advantage of dependency injection, therefore they access the root span for the current request and attach error details to it. That's exactly what happens in the starter project generated by pavex new, using the following error observer:

pub async fn log_error(e: &pavex::Error, root_span: &RootSpan) {
    let source_chain = error_source_chain(e);
    // Emit an error event
    tracing::error!(
        error.msg = %e,
        error.details = ?e,
        error.source_chain = %source_chain,
        "An error occurred during request handling",
    );
    // Attach the error details to the root span
    // If multiple errors occur, the details of the last one will "prevail"
    root_span.record("error.msg", tracing::field::display(e));
    root_span.record("error.details", tracing::field::debug(e));
    root_span.record("error.source_chain", error_source_chain(e));
}

That's all you need to meet the requirements I set at the beginning of this post.
No workarounds, no sharp edges, no corner cases.

Conclusion

It is not possible to fully and reliably satisfy our telemetry requirements with either axum nor Actix Web.
Actix Web comes much closer though: that's why I still recommend Actix Web over axum when people ask me for advice on which Rust web framework to use for their next project. Solid error reporting is that important to me.

Pavex, on the other hand, easily meets all the requirements.
It's not a coincidence: I've been building it with these requirements in mind from day one, making error reporting a first-class concern. I'm confident to say that, right now, Pavex has the best error reporting story in the Rust web ecosystem.

Nonetheless, there is no intrinsic limitation preventing Actix Web or axum from converging to a similar design (or perhaps a new one!) to resolve the issues I've highlighted in this post.
I sincerely hope that happens—the main advantage of having different frameworks is the constant cross-pollination of ideas and the pressure to improve.

You can discuss this post on r/rust.

Subscribe to the newsletter if you want to be notified when a post is published!
You can also follow the development of Pavex on GitHub.

Footnotes

I originally wanted to include Rocket in this comparison, but I quickly realised that it doesn't provide enough hooks to even wrap a tracing::Span around the request-handling Future. That's a prerequisite to a correct implementation of structured logging, there's no point in going further without it.

If you're using TraceLayer, from tower_http, you might be wondering: isn't that enough? Isn't that the single place? Unfortunately, TraceLayer::on_failure doesn't get to see the error, it only looks at the response generated by the error!

There's another variation of this approach: you return the same error type (e.g. ApiError) from all your extractors, request handlers and middlewares. The two approaches are fundamentally equivalent.

⁴

What's stored inside Extensions has to be clonable. This can be solved by wrapping the original error inside an Arc.

⁵

There's another issue with failures in Actix Web middlewares, but it'd take forever to get into the details and explain it. The TL;DR is that the invocation of the downstream portion of the middleware stack should return a Response but it now returns a Result, creating a weird separate track for errors that's hard to integrate with the overall error handling story.

Rust web frameworks have subpar error reporting

Table of contents

What are errors for?

Reacting

Reporting

Modelling errors in Rust

The Error trait

Our benchmark

High-level requirements

Low-level requirements

Frameworks

axum

Request handlers

IntoResponse

Extractors

Can axum meet our requirements?

Workaround #1

Workaround #2

Actix Web

Request handlers

Responder

ResponseError

HttpResponse::error

Extractors

Can Actix Web meet our requirements?

Pavex

Error requirements

IntoResponse

Error handlers

Error observers

Can Pavex meet our requirements?

Conclusion

Footnotes

The `Error` trait

`axum`

`IntoResponse`

Can `axum` meet our requirements?

`Responder`

`ResponseError`

`HttpResponse::error`

`IntoResponse`