Benchmarking Rust for Serverless

Let’s start with two very important questions:

  1. Why should I care to benchmark Rust since it’s already super fast?

That’s the question we’ll try to answer in this post!

  1. Will there be a 🍕 demo?

Of course, you know that I’m a true 🍕🍕🍕 lover!

benchmark rust

What’s so special about Serverless?

Benchmarking is not specific to serverless. But in serverless components, such as AWS Lambda functions, performance really matters for two main reasons:

  • Cold start duration (good news, Rust is really performant as you can see in my daily updated benchmark)

  • Runtime duration, as AWS is billing per millisecond.

Let’s see how we can measure and improve this runtime duration using benchmarks!

What are we going to benchmark?

Let’s take a very simple example:
a function which returns the pizza of the day. 🍕 (I told you)

We will compare two different implementations:

  • one with HashMap - std::collections::HashMap
  • one with Vec - std::vec::Vec

(if you don’t know much about Rust, you might wish to take a look at some basic Rust code before we start, like the Rust book or my Rust Youtube channel 😇)

First, let’s define our PizzaStore trait:

pub trait PizzaStore {
    fn get_pizza_of_the_day(&self, day_index: i32) -> &str;

And our first implementation with HashMap:

// let's make sure we don't initialize a new HashMap each time
pub struct PizzaHashMap<'a> {
    cache: HashMap<i32, &'a str>, 

impl<'a> PizzaHashMap<'a> {
    pub fn new() -> Self {
        PizzaHashMap {
            cache: HashMap::from([
                (0, "margherita"),
                (1, "deluxe"),
                (2, "veggie"),
                (3, "mushrooms"),
                (4, "bacon"),
                (5, "four cheese"),
                (6, "pepperoni"),
                // what's your favorite?

impl<'a> PizzaStore for PizzaHashMap<'a> {
    // let's get the pizza of the day from the cache (HashMap)
    fn get_pizza_of_the_day(&self, day_index: i32) -> &str {
        match self.cache.get(&day_index) {
            Some(&pizza) => pizza,
            None => panic!("could not find the pizza"),

Writing our first benchmark

There are quite some crates to create benchmarks but we’ll use criterion here. Let’s start by creating a benches folder containing a file and add our first criterion:

fn criterion_hashmap(c: &mut Criterion) {
    let mut rng = rand::thread_rng();
    // we create the cache outside of the bench function so only one HashMap will be created
    let pizza_store_hashmap = PizzaHashMap::new();
    // we call get_pizza_of_the_day with a random day index
    c.bench_function("with hashmap", |b| b.iter(|| pizza_store_hashmap.get_pizza_of_the_day(rng.gen_range(0..7))));

We also need a bench group (as we will add more criterion later)

criterion_group!(benches, criterion_hashmap);

That’s it! Let’s run it with cargo bench

By default, it runs our function for about 5seconds (that’s about 300M+ iterations)

     Running benches/ (target/release/deps/benchmark-2f28819806c8c7c9)
with hashmap            time:   [13.069 ns 13.090 ns 13.114 ns]

Left and right values are lower and upper bounds.

The number in the middle is the best estimation on how long each iteration is likely to take.

Note that those 3 numbers are extremely alike. This won’t be the case if you’re depending on networking for instance.

Second implementation: with Vec

Let’s create a different implementation using Vec instead of using HashMap using the following code.

pub struct PizzaVec<'a> {
    cache: Vec<&'a str>

impl<'a> PizzaVec<'a> {
    pub fn new() -> Self {
        PizzaVec {
            cache: vec!["margherita","deluxe","veggie","mushrooms","bacon","four cheese","pepperoni"],

impl<'a> PizzaStore for PizzaVec<'a> {
    fn get_pizza_of_the_day(&self, day_index: i32) -> &str {
        match self.cache.get(day_index as usize) {
            Some(&pizza) => pizza,
            None => panic!("could not find the pizza"),

Let’s create a new criterion so we can compare (that’s the goal of benchmarks!)
Back in we can add

fn criterion_vec(c: &mut Criterion) {
    let mut rng = rand::thread_rng();
    let pizza_store_vec = PizzaVec::new();
    c.bench_function("with vector", |b| b.iter(|| pizza_store_vec.get_pizza_of_the_day(rng.gen_range(0..7))));

and update our group to include this new criterion:

criterion_group!(benches, criterion_hashmap, criterion_vec);

so we can re-run our benchmarks with : cargo bench

and check the result!

with hashmap            time:   [13.096 ns 13.117 ns 13.141 ns]
with vector             time:   [7.5832 ns 7.5958 ns 7.6097 ns]

By replacing our HashMap with a Vec, our program is now running almost twice as fast!

This is a great reminder on how the data structure choice is really important depending on the use case 😇

Ok great, but how does it translate to Serverless?

Two AWS Lambda Functions have been deployed embedding each one of the implementations.

Each lambda function calls 10_000_000 times get_pizza_of_the_day

In us-east-1 with 128MB, here are the results:

ImplementationRuntime duration
HashMap267.92 ms
Vec87.98 ms 🤯🤯

That’s it!

Of course this was a simple example but I hope I’ve convinced you to use benchmarks to optimize your code!

Bonus question: where does this overhead come from?

Stay tuned for the next blog post about Rust profiling!

❤️ ❤️ ❤️ Did you like this content? ❤️ ❤️ ❤️