Generics And Zero-cost Abstractions

• 5378 words
• 27 min

C++ implementations obey the zero-overhead principle: What you don't use, you don't pay for. And further: What you do use, you couldn't hand code any better. [Foundations of C++ - Bjarne Stroustrup]

Checkpoint

We left, at the end of the previous episode, with an intuitive understanding of Rust's ownership system: we worked with vectors of integers, Vec<i32>, and we came up with a naive - but surprisingly fast! - scalar product implementation followed by a very simple sort function using the bubble sort algorithm.

In this episode we will implement a generic version of the same scalar product routine. This will require the introduction of several key concepts concerning Rust's type system: generics, traits, operators, associated types, Copy. We will cover a lot of ground and this might feel a little bit overwhelming - remember that we are laying down solid foundations. Concepts will get familiar with use and they will allow us to navigate with confidence Rust's numerical ecosystem, focusing on what we are trying to accomplish (scientific computing) instead of gazing at compiler errors in utter confusion.

Generics

Integers are cute: we should all devote some time, every morning, to pay our respects to integer arithmetics.

But we can't solve everything using 32 bit integers. We might need a different number of bits (i8, i16, i64, i128). Sometimes sign doesn't matter (u8, u16, u32, u64, u128). Sometimes the world is harsh and we need to brace ourselves for floating point arithmetic (f32, f64).

But there is one thing that we definitely do not want to do: having to write a scalar product implementation for every single one of these types. And Rust won't force us to do it: one more thing to be grateful of every morning.

Generics are Rust's abstraction over data types: functions can specify type parameters in their signature and use them to express the relationships between the data types of their arguments. Let's see what it looks like to use generics to generalise our scalar_product function to a wider family of reasonable numerical types.

The final signature of scalar_product in the previous episode was:

fn scalar_product(v: &[i32], w: &[i32]) -> i32


We take in two slices from vectors with i32 elements and we return an i32 value as output. We'd like to keep the same structure for our generic version: we will take two slices from vectors containing elements of the same type, let's call it T, and we will return a T value as output. The generic function signature looks like this:

fn generic_scalar_product<T>(v: &[T], w: &[T]) -> T


generic_scalar_product takes one type parameter, that we called T. We rephrased the type annotation for v, w and the output using T: this means that we expect the same element type for the elements of both input slices and for our output value, just as we wanted. The whole function looks like this now: (RP)

fn generic_scalar_product<T>(v: &[T], w: &[T]) -> T
{
// The number of elements we are dealing with
let length = v.len();
// We expect both vectors to have the same number of elements
assert_eq!(length, w.len());

// We initialise a new variable to 0
// We will use it to accumulate the scalar product
let mut sum = 0;
// We iterate over the range of indexes
for index in 0..length {
// We add the product of elements corresponding to the same index
sum = sum + v[index] * w[index];
}
// We return the scalar product
// There is no return keyword in Rust!
sum
}


If we try to compile it, we get a bunch of errors - let's look at the first one:

   Compiling playground v0.0.1 (/playground)
error[E0369]: binary operation * cannot be applied to type T
--> src/lib.rs:16:21
|
16 |         sum = sum + v[index] * w[index];
|                     ^^^^^^^^^^^^^^^^^^^
|
= note: T might need a bound for std::ops::Mul


T is a type parameter, nothing more than a placeholder for a concrete type, like i32 or f64. No behaviour is associated, by default, with values of type of T - the compiler doesn't make any assumption concerning what can or cannot be done using this type: can we multiply T values? Can we sum them together? Can we compare them? The compiler doesn't know! And if the compiler cannot make sure at compile time that all the operations we want to do using T values are legitimate, it just won't compile our code. No runtime failures here, because you won't get far enough to actually run it!

This is great, and it will surely improve the quality of your sleep if you have code running in a production environment... but how do we actually give the compiler enough information on T to write the generic version of scalar_product that we so desperately need?

We could make a list of types satisfying these conditions: i32, i64, usize and a bunch of others. A whitelist, if you want. But what happens when someone else comes up with a new numerical type with a legitimate concept of addition and multiplication (e.g. complex numbers)? We would have to modify our code to include its type into our whitelist - this would greatly hinder the reusability of what we are writing. That's why we want to encode behaviour in an abstract way: we don't want to prescribe the exact types our function can use. We only want to prescribe constraints - leaving our code open to be used in contexts that we didn't foresee without having to modify it, as long as the required conditions for its execution are satisfied.

We need a way to define shared behaviour for the type parameters of our functions - this is where traits come into the fray.

Traits

A trait is a concept quite similar to what other languages call interfaces: it's Rust's way to encode behaviour - to tell the compiler about what functionality is associated with a particular type.

Do you want to print your variable with println? Its type must implement the Debug trait. Do you want to clone it with clone? Its type must implement the Clone trait. Do you want to implement a new routine to read file from disk? If it implements the Read trait, it can replace std::fs::File, from the standard library.

Traits are everywhere - but how do they actually work? How does a trait encode behaviour? How does a type implement it?

A trait definition includes a list of method signatures. Let's see it in action using a simple example - addition. We want to define a trait to distinguish those types for which it make sense to sum together two values. A first draft looks like this: (RP)

// trait <name of the trait>
// list of methods required on a data type
// to implement this trait
fn sum(self, x: Self) -> Self;
}


Self is a special type: it's used to indicate the data type that is implementing this trait (i.e. the type of self). sum's signature tells the compiler that we expect self, x and the output to be values of the same type. Let's implement Addition on i32: (RP)

// impl <name of the trait> for <name of the type> { ... }
// Self=i32 in this case, given that we are
fn sum(self, x: Self) -> Self {
// The actual implementation of sum for i32
// (I am cheating, I know)
self + x
}
}


If Addition had two methods instead of one in its definition both of them would have had to be implemented in the impl Addition for i32 block - if we forget to implement some of the methods in a trait definition the compiler screams, loudly.

Let's define a (generic) function that takes in three values of the same type and returns their sum:

fn sum_three<T>(a: T, b: T, c: T) -> T


As we discussed before for generic_scalar_product, T has no behaviour associated to it. We can't call functions on values of type T. However, we know that we could successfully implement sum_three if we knew how to sum together two values of type T. In other words, if T implemented the Addition trait!

We can express this constraint using a trait bound:

fn sum_three<T>(a: T, b: T, c: T) -> T
// where indicates the beginning of the trait bound section
where
// Syntax:
// <Name of the type>: <name of the trait we want this type to implement>


T: Addition means that we can only call sum_three on values of a type that implements the Addition trait. Given this additional piece of behaviour, we can flesh out the function body: (RP)

fn sum_three<T>(a: T, b: T, c: T) -> T
where
{
let partial_sum = a.sum(b);
let sum = c.sum(partial_sum);
sum
}


We are using the only piece of information we have on T: it implements Addition, hence we can call sum method on its values. The compiler can verify it, looking at our function body and at the definition of Addition, hence it does not complain. If we try it out using (RP)

assert_eq!(sum_three(1, -2, 3), 2); // All good!


everything works like a charm. Of course, the only type in the world implementing Addition right now is i32 - we can't really call sum_three on anything else. Nonetheless, nothing prevents a fellow Rust programmer, in another corner of the world, from implementing the Addition trait that we have exposed in our project for one of its own types - thus unlocking all the functionalities we have already built it.

That's how addition actually works in Rust!

Addition in Rust: Add

We listed some common Rust traits before, but we failed to mention that a lot of operators are actually traits in disguise!

The standard library exposes a trait for addition, called Add. It turns out that when you write

let a = 1 + 2;


what is actually happening is

let a = 1.add(2);


where add is a method defined in the Add trait - + is just syntactic sugar. The compiler infers that 1 and 2 are unsigned integers (type usize): usize implements the Add trait hence we can use the sum operator + on its values.

Let's have a look at the definition of the Add trait:

// RHS = Right Hand Side
/// The resulting type after applying the + operator.
type Output;

/// Performs the + operation.
fn add(self, rhs: RHS) -> Self::Output;
}


It's actually quite similar to the Addition trait we defined on our own, but it provides a little bit more flexibility. Our Addition trait, in fact, is quite limited: it works just fine if we want to sum together two i32 values, but can we sum an i32 value and a reference to an i32 value? If we try to run (RP)

let a = -1;
let b = -3;
let c = &b;
let sum = a.sum(c);


the compiler will complain:

error[E0308]: mismatched types
--> src/main.rs:27:21
|
27 |     let sum = a.sum(c);
|                     ^
|                     |
|                     expected i32, found &{integer}
|
= note: expected type i32
found type &{integer}


It's because of the signature we have chosen for the sum method:

// Input values and output have to be of the same type!
fn sum(self, x: Self) -> Self


The left and right operand have to be of the same type, but we have an i32 and a &i32 - it doesn't work. Would it work using the Add trait?

Let's test it out if we can sum an i32 and a &i32 using Add: (RP)

let a = -1;
let b = -3;
let c = &b;
let sum = a + c;


It runs without any problem. How so? The devil is in the details - Add is a generic trait (yes, traits can be generic as well!):

// RHS = Right Hand Side
// RHS is a type parameter!


This allows the standard library to provide two implementations of the Add trait for i32:

// Equivalent to our Addition trait
impl Add<i32> for i32 { ... }
// With this we can add a reference to a value!
impl<'a> Add<&'a i32> for i32 { ... }


At compile-time, depending on the type combination of the operands, Rust will produce machine instructions corresponding to the proper implementation of add for each expression containing + in our code.

Associated Types

There is one more oddity in the definition of Add which I have not commented on:

trait Add<RHS> {
/// The resulting type after applying the + operator.
type Output;

/// Performs the + operation.
fn add(self, rhs: RHS) -> Self::Output;
}


What is type Output exactly? It's an associated type - it's a type placeholder that can be used in the signature of the trait methods (as output type for add, in our case). When the Add trait is implemented for a type, the concrete Output type for that implementation has to be specified. For example:

// Same behaviour of our Addition trait
type Output = i32;
[...]
}
// With this we can add a reference to a value!
impl<'a> Add<&'a i32> for i32 {
// But the output is going to be a value, not a reference
type Output = i32;
[...]
}


If associated types feel quite similar to type parameters it's because they actually are! But it's not time yet to delve deep into the details - it's enough for now to be able to read associated types in a trait definition and have an understanding of what they are used for.

Armed with our new knowledge, we can go back to generic_scalar_product and fix it. Let's do it.

Generic scalar product

Let's resume our quest for a generic scalar product from where we left it - an error message from our favourite compiler:

error[E0369]: binary operation * cannot be applied to type T
--> src/lib.rs:16:21
|
16 |         sum = sum + v[index] * w[index];
|                     ^^^^^^^^^^^^^^^^^^^
|
= note: T might need a bound for std::ops::Mul


We can make sense of it now: T is a type placeholder with no behaviour associated to it, because in the function signature

fn generic_scalar_product<T>(v: &[T], w: &[T]) -> T


we haven't provided any constraint on T. The compiler looked at our code and noticed that we are trying to multiply two values together using the multiplication operator, *. It thus kindly suggested us to add a bound - a trait bound - on our type T: we should require that T implements the Mul trait.

Mul is the equivalent of Add for multiplication: if a type implements Mul we can use the * operator on its values. Let's follow the compiler's suggestion - the new signature looks like this: (RP)

fn generic_scalar_product<T>(v: &[T], w: &[T]) -> T
where
T: std::ops::Mul


If we compile it, the previous error goes away... but it's not finished yet:

   Compiling playground v0.0.1 (/playground)
error[E0277]: cannot add <T as std::ops::Mul>::Output to {integer}
--> src/lib.rs:16:19
|
16 |         sum = sum + v[index] * w[index];
|                   ^ no implementation for
|            {integer} + <T as std::ops::Mul>::Output
|
= help: the trait std::ops::Add<<T as std::ops::Mul>::Output>
is not implemented for {integer}


It's complaining that we cannot add the output of v[index] * w[index] to sum. sum, in fact, is of type integer - we initialised it like this

    let sum = 0;


just a couple of lines above. Rust does not perform implicit conversions: two numerical values have to be of the same type to be summed or multiplied together. Conversions, if required, have to be explicit. This is way the compiler is complaining: we can't assume, given what we know about T, that it will be possible to add a 0 to a T value.

This can be easily seen with a quick experiment. Let's use floats, replacing T with f32 in the function signature (without touching the body): (RP)

fn float_scalar_product(v: &[f32], w: &[f32]) -> f32


If we compile it, it fails again

error[E0277]: cannot add f32 to {integer}
--> src/lib.rs:14:19
|
14 |         sum = sum + v[index] * w[index];
|                   ^ no implementation for {integer} + f32
|
= help: the trait std::ops::Add<f32> is not implemented for {integer}


and it complains exactly about the same line, for the same reason: integers do not implement the trait Add<f32>, hence we can't use + to sum floats and integers together!

To get it right we need a way to initialize sum with a value of type T - let's call z this value: we want a + z = a = z + a for all a of the type T. T's additive identity, if we want to use the mathematical terminology.

It's time to introduce num-traits - it's a key crate1 in Rust's scientific ecosystem. It provides several traits to work with numerical types which are used by the vast majority of Rust numerical projects.

If we skim its documentation2, we'll find what we are looking for: the Zero trait. A type that implements the Zero trait exposes two methods:

• zero, a function that returns a neutral element with respect to the sum operation;
• is_zero, to check if a value of type T is equal to output of zero.

zero is what we need. Let's replace

let sum = 0;


with

let sum = T::zero();


Our function now looks like this: (RP)

extern crate num_traits;

fn generic_scalar_product<T>(v: &[T], w: &[T]) -> T
where
// A type can implement multiple traits!
// The + sign here means that we expect T to implement
// both num_traits::Zero and std::ops::Mul
T: num_traits::Zero + std::ops::Mul
{
let length = v.len();
assert_eq!(length, w.len());

// We initialise sum to a "zero" of type T
// using the zero method provided by the Zero trait
let mut sum = T::zero();
for index in 0..length {
sum = sum + v[index] * w[index];
}
sum
}


It still doesn't compile:

error[E0308]: mismatched types
--> src/lib.rs:14:21
|
14 |         sum = sum + v[index] * w[index];
|                     ^^^^^^^^^^^^^^^^^^^
|    expected type parameter, found associated type
|
= note: expected type T
found type <T as std::ops::Mul>::Output


Still not good enough and it's complaining about the same line we thought we just fixed! What is going on?

The error is slightly different this time: using zero, sum is correctly inferred to be a value of type T, as we desired. What the compiler is still confused about is the type we get as output of the multiplication, <T as std::ops::Mul>::Output. Let's look at the definition of the multiplication trait, std::ops::Mul:

pub trait Mul<RHS> {
/// The resulting type after applying the * operator.
type Output;

/// Performs the * operation.
fn mul(self, rhs: RHS) -> Self::Output;
}


Structurally identical to the Add trait definition. As we can see, the output type of mul is Output, not Self: this is what the compiler is complaining about. It's technically possible, given our signature, to have an Output type that does not coincide with Self. This is not a remote corner case - it actually happens for several implementations in the standard library: when you multiply a reference to a value, the output is a value. For example:

// With this we can multiply a value (right operand) with a reference (left operand)
impl<'a> Mul<i32> for &'a i32 {
// But the output is going to be a value, not a reference
// Self=&i32 != Output=i32
type Output = i32;
[...]
}


To get the behaviour we imagined (multiplying values of type T to get another value of type T) we need to add another constraint: we need to tell the compiler that we expect Mul::Output to be T, nothing more nothing less. The signature becomes: (RP)

fn generic_scalar_product<T>(v: &[T], w: &[T]) -> T
where
T: num_traits::Zero + std::ops::Mul<Output = T>


Time to compile, once again. The type error is gone (finally!) but there are other problems we need to take care of:

error[E0508]: cannot move out of type [T], a non-copy slice
--> src/lib.rs:14:21
|
14 |         sum = sum + v[index] * w[index];
|                     ^^^^^^^^ cannot move out of here



It's the borrow checker, knocking on our door - we are violating the rules of ownership.

The Copy trait

mul takes its left operand, self, by value:

fn mul(self, rhs: RHS) -> Self::Output;


This, if you remember what we discussed in the previous episode, triggers what we called move semantics: the ownership of self, in this case, is moved to the mul function - the value is consumed. But this cannot be allowed: &[T] is an immutable slice - a read-only view over the values of a vector; a vector owns all its elements - the ownership of the whole vector can change, but we cannot move out of it one or more of its values.

This is not one of the lines we changed in our pursuit of a generic scalar product: why is the compiler complaining now? Everything looked ok when we were dealing with &[i32]! The reason, once again, is a trait: Copy.

Types that implement the Copy trait are an exception to the rules of move semantics: when their values should be moved, they are instead copied over and the old variable is still usable. This can be shown with a very simple example: (RP)

// i32 implements the Copy trait
let a: i32 = -1;
// Now, according to move semantics, b should own -1
// while a should have been consumed
let b = a;
// Using a after it has been consumed should trigger a compiler error...
// But it doesn't: let b = a has not consumed a,
// it has just copied a's value and stored the
// copied value in b.
// a is still a valid variable and we can use it freely
let c = a + 2;


Why would anyone desire this kind of behaviour? I stressed several times that we are in control of resources when using Rust and then it turns out it was copying values all over the place behind our backs... That looks evil, to say the least.

To understand why it's ok to copy over values of certain types, we need to get a slightly deeper understanding of how data is laid out in memory by Rust, introducing the concepts of stack and heap. While the discussion is interesting (and I plan to take it on soon enough in this series) elaborating on the details requires some space and it would lead us astray from our current focus (a generic scalar product, at last). If you are curious and you can't sleep unless your curiosity has been satisfied, you can have a peek at the dedicated section in the book.

It suffices to say, for now, that moving ownership of a value of a type that implements Copy has exactly the same computational cost of copying its value using clone - hence Rust avoids us the trouble of sprinkling clone calls around and copies the value implicitly when required.

All primitive numerical types implement the Copy trait. This explains why our code was working when we were dealing with a vector slice of i32 elements - this line

sum = sum + v[index] * w[index];


was using Copy semantics instead of move semantics: the values of v[index] and w[index] were being copied to be passed as arguments to mul, instead of being moved out of the vector slice. The compiler had nothing to complain about.

These does not work when we work with a generic type parameter: we haven't constrained T to implement the Copy trait, hence the compiler falls back to the default, move semantics, which does not play very well with the ownership of vector elements.

Let's add Copy as a trait bound on T - all primitive numerical types satisfy the condition and that's more than good enough for now. The function signature becomes: (RP)

fn generic_scalar_product<T>(v: &[T], w: &[T]) -> T
where
T: num_traits::Zero + std::ops::Mul<Output = T> + Copy


and... it compiles 🎉🎉🎉

I claimed this version was generic enough to be used with all numerical types - don't trust my words, try it out: (RP)

// Unsigned integers
let x: Vec<u32> = vec![1, 1, 1];
let y: Vec<u32> = vec![1, 0, 1];
assert_eq!(generic_scalar_product(&x, &y), 2);
// Signed integers
let f: Vec<i32> = vec![1, 1, -1];
let g: Vec<i32> = vec![1, 0, 1];
assert_eq!(generic_scalar_product(&f, &g), 0);
// Floats
let v: Vec<f32> = vec![1., 1., 1.];
let w: Vec<f32> = vec![1., 0., 1.];
assert_eq!(generic_scalar_product(&v, &w), 2.);


Zero-cost abstractions

We managed to get what we wanted: a generic scalar product. Let's pause for a second now. Let's compare what we achieved (generic version) to what we started with (i32 version):

extern crate num_traits;

// What we started with..
.
fn scalar_product(v: &[i32], w: &[i32]) -> i32
{
let length = v.len();
assert_eq!(length, w.len());

let mut sum = 0;
for index in 0..length {
sum = sum + v[index] * w[index];
}
sum
}

// ...and what we ended up with
fn generic_scalar_product<T>(v: &[T], w: &[T]) -> T
where
T: num_traits::Zero + std::ops::Mul<Output=T> + Copy
{
let length = v.len();
assert_eq!(length, w.len());

let mut sum = T::zero();
for index in 0..length {
sum = sum + v[index] * w[index];
}
sum
}


Understanding what to add/modify/tweak was not trivial - we had to introduce several new Rust concepts, with a fair bit of trial and error. But the final output does not look too different from the original scalar product: the code is just as clear as the original version and those trait bounds will look familiar soon enough as we write more and more Rust code (because we will, don't doubt it!).

There is another question to answer though: how does it perform? We have a more powerful function, thanks to the abstractions we have introduced (generics and traits), but are those abstractions going to affect the performance of our code? Will generic_scalar_product, when called on a vector slice with i32 elements, be slower than the original scalar_product?

Nothing better than a quick benchmark to clear out our doubts: no significant difference in running time.

This is due to a process called monomorphization. Every time a generic function is called with a concrete set of types, the compiler will create a copy of the function body replacing the generic type parameters with the specific operations that apply to the concrete types. This allows the compiler to optimize each instance of the function body with respect to the concrete types involved: the result is no different from what we would have achieved writing down separate functions for each type, without using generics or traits. In other words, we do not pay any runtime costs for using generics.

This concept is extremely powerful and it's often referred to as zero-cost abstraction: using higher-level language constructs results in the same machine code you would have obtained with uglier/more "basic" implementations. We can thus write code that is easier to read for a human (as it's supposed be!) without having to compromise on the quality of the final artifact.

Wrapping up

This episode has seen a lot of action: we introduced several new concepts3 (generics, traits, associated types, Copy) which, at a first glance, might look complicated or obscure.

Nonetheless we managed to nail our objective: we wrote a generic function to compute the scalar product of two vector slices whose elements satisfy a bunch of basic requirements (we can multiply them, we can sum them and there is a zero we can use). We achieved it by reverse-engineering4 the trait bounds we actually needed: we started with a concrete implementation of our function and, using every compiler error, we understood what was missing to make it work (given the required background knowledge, of course). This is a general approach that can be re-used in other circumstances: it might take you a while to be able to spot what trait bounds are required on a function just by looking at its code, but the compiler will generally point you in the right a direction - a couple of feedback cycles with cargo build should give you the trait bounds you need. To put it under a different perspective, you are de facto doing a pair programming session with the Rust compiler - and it definitely is one of the nicest buddies you could get for yourself ;)

In the next episode we will go concrete once again: we will write a routine to calculate the integral of a real function over a closed interval. Thanks to all the ground we covered in this episode we will be able to introduce Array1, ndarray's one dimensional array. Stay tuned!

Notes, suggestions and feedbacks to improve this introduction are more than welcome, either here as Disquus comments or on Twitter @algo_luca (we want more beautiful people to try Rust out for scientific computing, don't we?).

Thanks to @fluxdot, xd009642 and stelong for providing feedback and comments on the early draft of this article.

Scientific Computing: A Rust adventure (TOC)

1

Libraries (or packages) in Rust are called crates. They can be imported in your project using cargo, Rust's package manager. A quick and clear introduction to cargo can be found in the book.

2

Once a new version of a crate is released, its documentation is automatically generated and made available on docs.rs. Just type the name of the crate you are looking for and its docs will come up!

3

Sometimes it helps, when you are approaching new concepts for the first time, to look at them from multiple perspectives. Having a look at the relevant chapter in the Rust book could be useful!

4

I borrowed this expression from the last section of Chapter 11 in Programming Rust: fast, safe systems development. It's a slightly more advanced resource compared to the book, but they complement each other quite nicely and I highly recommend it as a learning resource when you feel like digging deeper in how Rust works.