Rust Trait Objects Demystified

Updated: 11 April 2022
Originally posted: 14 August 2021

I recently picked up Programming Rust: Fast, Safe Systems Development (2nd Ed) by O'Reilly and one section that I particularly enjoyed is where it covered the approach on using generics (polymorphism) vs. Trait Objects (type-erasure via monomorphization).

When I first picked up Rust, I was looking to approach the "any-type" problem with the generics hammer, and this leads to a rather severe design

struct Salad<V: Vegetable> {
  veggies: Vec<V>
}

We can however take advantage of dyn Trait. The dyn keyword is used to highlight that calls to methods on the associated Trait are dynamically dispatched. To use the trait this way, it must be 'object safe'. Dynamic dispatch means a dyn Trait reference contains two points, one to the data (i.e., an instance of a struct), and the other to the vtable (virtual method table), which maps method call names to function pointers.

Its concrete type is only known at run-time, that's the whole point of dynamic dispatch.

Trait objects are unsized and therefore need to be "hidden" behind a reference or a smart pointer, such as a Box.

A Box allows you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data, and this is also known as a smart pointer.

Because a reference has a statically-known size, and the compiler can guarantee it points to a heap-allocated trait, we can return a trait object that is wrapped behind a shared reference.

Rust tries to be as explicit as possible whenever it allocates memory on the heap. So if your function returns a pointer-to-trait-on-heap in this way, you need to write the return type with the dyn keyword, e.g. Box<dyn ExampleTrait>.

In the example below this vector is of type Box<dyn Vegetable>, which is a trait object; it is a stand-in for any type inside a Box that implements the Vegetable trait.

struct Salad {
  veggies: Vec<Box<dyn Vegetable>>
}

Keep in mind that Vec<V> and Vec<dyn Veg> are very different in practice. The former is a homogenous collection that benefit from static dispatch and monomorphization, whereas the latter is a heterogenous collection with fewer optimizations.

You will find the source for this example here.

In this project the domain model I chose was that we are setting up a systems monitoring tool and we need to be able to monitor different systems.

Similar to our Salad example, we are now setting up Monitorable, which has a context field that contains a Trait object.

#[derive(Clone)]
pub struct Monitorable {
    context: Box<dyn MonitorableContext>,
    note: String,
}

Previously, I touched on Trait objects being unsized, but what does this really mean?

Let's consider this example - I have included the error when attempting to compile this to show how our weird function is rather odd.

/// Say hello in Norwegian
pub trait Hei {
    fn hei(&self);

    fn weird() {};
// error[E0038]: the trait `dispatch::Hei` cannot be made into an object
//   --> src/lib/dispatch.rs:19:20
//    |
// 19 | pub fn say_hei(s: &dyn Hei) {
//    |                    ^^^^^^^ `dispatch::Hei` cannot be made into an object
//    |
// note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically; for more information visit <https://doc.rust-lang.org/reference/items/traits.html#object-safety>
//   --> src/lib/dispatch.rs:4:8
//    |
// 1  | pub trait Hei {
//    |           --- this trait cannot be made into an object...
// ...
// 4  |     fn weird() {};
//    |        ^^^^^ ...because associated function `weird` has no `self` parameter
// help: consider turning `weird` into a method by giving it a `&self` argument
//    |
// 4  |     fn weird(&self) {};
//    |              +++++
// help: alternatively, consider constraining `weird` so it does not apply to trait objects
//    |
// 4  |     fn weird() where Self: Sized {};
//    |                +++++++++++++++++
}

The requirements to meet object-safety are clear and the first one is that these functions must have a receiver that has type Self (or one that dereferences to the Self type) meaning methods on a trait object need to be callable via a reference to its instance - this makes sense, as it is ultimately an object.

Secondly, functions on a trait-object cannot use Self (or return Self). If one were to write fn weird(&self) -> Self {}; this would not be object safe.

Finally, we cannot use type parameters (generics) in functions called on a Trait-object.

Let's consider this setup

pub trait Hei {
    fn hei(&self);

    fn weird(&self);

    fn need_sized(self) -> Self
    where
        Self: Sized;
}

impl Hei for String {
    fn hei(&self) {
        println!("hei {}", self);
    }

    fn weird(&self) {
        println!("you called wierd {}", self);
    }

    fn need_sized(self) -> Self {
        self
    }
}

We can now try to call it

    // Use our Hei trait
    let x: &dyn Hei = &"hei".to_string();
    x.need_sized();

error: the `need_sized` method cannot be invoked on a trait object
  --> src/main.rs:31:7
   |
31 |     x.need_sized();
   |       ^^^^^^^^^^
   |
  ::: src/lib/dispatch.rs:9:15
   |
9  |         Self: Sized;
   |               ----- this has a `Sized` requirement

Clearly, this function is not object-safe and therefore we cannot use in the context of a trait-object.

However, the method implemented on String works just fine since it is sized

    let message = String::from("hello!");
    message.need_sized().to_string();

If you've gotten this far, this next bit is a bit verbose but bare with me, it will all start to make sense as we need to also cover Supertraits (also read this).

pub trait MonitorableContext: MonitorableContextClone {
    fn as_any(&self) -> &dyn Any;
    fn access_description(&self) -> Option<String>;
    fn access_service_tag(&self) -> Option<String>;
}

The trait object MonitorableContext is defined where MonitorableContextClone is a supertrait of MonitorableContext. Implementing MonitorableContext requires us to also implement MonitorableContextClone. This makes MonitorableContext a subtrait of MonitorableContextClone.

But why do we need to bother with a supertrait at all? Well, when taking the generic approach, one advantage there is that we can add as many traits as we want and combine them into a composition of traits - with Trait objects, we cannot do that -- BUT, oh, we can!

We want to define the Clone trait on our trait object MonitorableContext and this is why we have given it a supertrait MonitorableContextClone.

Notice how the trait MonitorableContextClone is implemented for the subtrait MonitorableContext, and in this context it is actually a blanket implementation for any trait that implements MonitorableContext and Clone (with a static-lifetime).

All that's left is for us to implement the Clone trait and this is where we can call clone_box().

pub trait MonitorableContextClone {
    fn clone_box(&self) -> Box<dyn MonitorableContext>;
}

impl<T> MonitorableContextClone for T
where
    T: 'static + MonitorableContext + Clone,
{
    fn clone_box(&self) -> Box<dyn MonitorableContext> {
        Box::new(self.clone())
    }
}

impl Clone for Box<dyn MonitorableContext> {
    fn clone(&self) -> Self {
        self.clone_box()
    }
}

Let's now take a look at actually implementing our shiny trait MonitorableContext. In the code below, we implement our trait on NetworkCard which we plan to pass at run-time as part of the Trait object.

#[derive(Debug, Clone, PartialEq)]
pub struct NetworkCard {
    pub description: String,
    pub service_tag: String,
    pub mac_address: String,
}

impl MonitorableContext for NetworkCard {
    fn as_any(&self) -> &dyn Any {
        self
    }

    fn access_description(&self) -> Option<String> {
        return Some(self.clone().description);
    }

    fn access_service_tag(&self) -> Option<String> {
        return Some(self.clone().service_tag.to_string());
    }
}

You might be wondering as to why access_description() returns an Option<String>. The reason is, I wanted to include support for variants by the use of enums. The default implementation for them looks like this

#[derive(Debug, Copy, Clone, PartialEq)]
pub enum MonitorableComponent {
    DiskSpace,
    FreeMem,
}

impl MonitorableContext for MonitorableComponent {
    fn as_any(&self) -> &dyn Any {
        self
    }

    fn access_description(&self) -> Option<String> {
        return None;
    }

    fn access_service_tag(&self) -> Option<String> {
        return None;
    }
}

This trait needs a better name, but the important part is that it is implemented on Monitorable.

pub trait CanMonitorShared {
    fn get_context(&self) -> &Box<dyn MonitorableContext>;
    fn get_note(&self) -> &String;
    fn get_server(&self) -> &Server;
    fn get_network_card(&self) -> &NetworkCard;
}

impl CanMonitorShared for Monitorable {
    ...

    fn get_network_card(&self) -> &NetworkCard {
        return self
            .context
            .as_any()
            .downcast_ref::<NetworkCard>()
            .expect("This should be a well behaved NIC");
    }
}

The important bit to notice above is that we access the context field of Monitorable's instance, which is the Trait object - remember, we don't know what type it is - and that's why we call as_any() which was previously implemented by the MonitorableContext trait. Notice our call to downcast_ref::<NetworkCard>() which neatly uses the 'Turbofish' ::<> syntax (refer docs on downcast_ref). Since downcast_ref returns an Option we could either match against it, Unwrap it or use Expect (which is the same as Unwrapping but at least providing a descriptive panic message).

Now it's time to put all of this together, so lets run a test

        monitor_note = String::from("Monitoring StarTech NIC");
        let nic_service_tag = String::from("PEX20000SFPI");
        let nic_description =
            String::from("StarTech PCIe fiber network card - 2-port open SFP - 10G");
        let nic_mac_address = String::from("00:1B:44:11:3A:B7");
        let network_card = Monitorable::new(
            monitor_note,
            Box::new(NetworkCard {
                description: nic_description.clone(),
                service_tag: nic_service_tag.clone(),
                mac_address: nic_mac_address.clone(),
            }),
        );

        assert_eq!(
            *network_card.get_network_card().mac_address,
            nic_mac_address
        );

Notice above our call to *network_card.get_network_card() is dereferenced since we used downcast_ref rather than downcast_mut.

You can run the example code on Github and checkout the todo/trait_object_approach branch, then run cargo test. If you liked this post please give me a shout on Twitter and share your thoughts in the comments below - Thanks!

Thanks to Jon Gjengset (@jonhoo) for his comments on this subject, which I've incorporated above.

Rust Trait Objects Demystified

Object-Safety with Traits

Using where Self:Sized

Supertraits

Downcasting to the Trait's concrete-type

Acknowledgments