Class 5: Lifetimes and Aliasing

Lifetimes

The lifetime of something is the duration of a program's execution during which it exists. That is, the lifetime of a value begins when it is first defined, and ends when it goes out of scope and is inaccessible by any part of the program. Managing lifetimes is incredibly important, since accessing an object outside its lifetime means that the object is not in a valid state and you will get undefined behavior. One of the goals of Rust is to ensure that it is impossible to access an object outside of its lifetime.

C++

Mismanaging lifetimes is a serious issue in real code. Louis Brandy gave an excellent talk, "Curiously Recurring C++ Bugs at Facebook" which, from (12:48 - 17:06) discusses a specific bug that happens all the time at Facebook that has to do with lifetimes. He even comments at 16:24 that

This is a really hard problem to fix (...) But at some point the language has to fix it and I don't know if it can or will.

Louis Brandy's broken C++ code is the following.

const string &get_default(const map<string, string> &map,
                          const string &key,
                          const string &dflt)
{
    auto pos = map.find(key);
    // If value was found, return that, otherwise return `dflt` (default)
    return (pos != map.end() ? pos->second : dflt);
}

The problem with this code is that the return value is a reference to a value that will either live as long as the value associated with the key, or only as long as the default, and it is quite common for the default to not live very long. The lifetimes of temporaries in C++ are complicated and, I believe, have changed since this talk was given in 2017 making it a little harder to break this function, but it is still very easily broken.

Consider this program utilizing this broken get_default.

#include <iostream>
#include <map>
#include <string>

using namespace std;

// same `get_default` as before
const string &get_default(const map<string, string> &map,
                          const string &key,
                          const string &dflt)
{
    auto pos = map.find(key);
    return (pos != map.end() ? pos->second : dflt);
}

const string &ref_to_temp(const map<string, string> &map) {
    string dflt = "DEFAULT"; // Lifetime of `dflt` begins here

    // If "Hello" isn't a key in `map`, returns a reference to `dflt`
    // which lives in this stack frame.
    return get_default(map, "Hello", dflt);
} // Lifetime of `dflt` ends here since the variable was on the stack

int main() {
    map<string, string> map;
    map["hello"] = "world";
    auto &v = ref_to_temp(map); // `uh-oh`, "Hello" isn't in `map`
    cout << v << '\n';
}

Surprisingly, on my computer at least, it prints out "EFAULT" rather than "DEFAULT" like we would expect, which is definitely a sign something wrong is going on. However, if we add an extra print statement to main as shown below, it suddenly prints out a ton of garbage.

int main() {
    map<string, string> map;
    map["hello"] = "world";
    auto &v = ref_to_temp(map);
    cout << v << '\n';
    cout << max(1, 2) << '\n'; // messes up the previous stack frame
    cout << v << '\n';
}

The problem with this isn't so much that this code is broken (it is in Rust, too), but that the compiler doesn't tell us it is broken, even turning on -Wall and -Wextra. Though, weirdly, g++, but not clang++, is able to find the error when optimizations are turned on for some reason.

Rust

What happens then if we translate this code to Rust?

#![allow(unused)]
fn main() {
use std::collections::HashMap;
fn get_default(map: &HashMap<String, String>, key: &String, default: &String) -> &String {
    match map.get(key) {
        Some(val) => val,
        None => default,
    }
}
}

We get a compiler error!

   Compiling playground v0.0.1 (/playground)
error[E0106]: missing lifetime specifier
 --> src/main.rs:4:82
  |
4 | fn get_default(map: &HashMap<String, String>, key: &String, default: &String) -> &String {
  |                     ------------------------       -------           -------     ^ expected named lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `map`, `key`, or `default`
help: consider introducing a named lifetime parameter
  |
4 | fn get_default<'a>(map: &'a HashMap<String, String>, key: &'a String, default: &'a String) -> &'a String {
  |               ++++       ++                                ++                   ++             ++

For more information about this error, try `rustc --explain E0106`.
error: could not compile `playground` due to previous error

Not only that, but it's a very helpful compiler error. The most important line of this error message is this.

this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from map, key, or default

Rust can't possible know until runtime if the return value is borrowed from map, key, or default. And this is important because the reference will become garbage once whatever it is borrowing from gets dropped, which might happen at different times. Handily for us, rustc suggests a fix, but to understand it we first need to learn how Rust models lifetimes.

Lifetimes in Rust

This explanation of lifetimes is mostly just a summary of parts of Chapter 10.3 of the Rust Book. If you want more details, we encourage you to read this section.

Consider the following broken code.

#![allow(unused)]
fn main() {
{
    let r; // `r` allocated here (not allowed to use it until it is assigned)
    {
        let x = 5; // `x` allocated here
        r = &x;
    } // `x` dropped here
    println!("{}", r);
} // `r` dropped here
}

This doesn't compile, since r can't outlive what it's borrowing from. But how was Rust able to tell? Every reference in Rust is given a lifetime, which is an indication of how long the reference is valid. Rust assigns lifetimes as shown below.

#![allow(unused)]
fn main() {
{
    let r;                // ---------+-- 'a
                          //          |
    {                     //          |
        let x = 5;        // -+-- 'b  |
        r = &x;           //  |       |
    }                     // -+       |
                          //          |
    println!("r: {}", r); //          |
}                         // ---------+
}

For this particular example, we'll call these two scopes 'a and 'b (lifetimes in Rust are denoted by a single quote ' pronounced "tick" and a lowercase, usually single letter identifier). Here, the outer scope is given the lifetime 'a, while the inner scope is given the lifetime 'b. Then, when Rust tries to assign &x to r, it can't, because r has lifetime 'a and &x only has lifetime 'b, which is shorter than 'a. The reverse is totally fine, as shown below.

#![allow(unused)]
fn main() {
{
    let r = 5;                // ---------+-- 'a
                              //          |
    {                         //          |
        let x = &r;           // -+-- 'b  |
        println!("x: {}", x)  //  |       |
    }                         // -+       |
	                          //          |
}                             // ---------+
}

In short, it's ok for a reference to refer to something that lasts longer than it does, but it is not ok for a reference to last longer than the thing it refers to.

This gets into an interesting and quite niche topic called lifetime subtyping. Quinn has an excellent blog post here that heavily involves lifetime subtyping if you're curious.

get_default

So how can we use this information to fix get_default? Taking the compiler's suggestion, we can try the following, which indeed compiles.

#![allow(unused)]
fn main() {
use std::collections::HashMap;
fn get_default<'a>(map: &'a HashMap<String, String>, key: &'a String, default: &'a String) -> &'a String {
    match map.get(key) {
        Some(val) => val,
        None => default,
    }
}
}

The 'a in the angle brackets is a generic lifetime parameter. It is just some lifetime. The syntax &'a HashMap<String, String> means a reference of lifetime at least 'a to a HashMap<String, String>. Likewise, &'a String means a reference of lifetime at least 'a to a String. So we are saying that the return value lives at least as long as whichever lives the shortest between map, key, and default. This works, but isn't quite what we want. Consider the following code.

#![allow(unused)]
fn main() {
use std::collections::HashMap;
fn get_default<'a>(map: &'a HashMap<String, String>, key: &'a String, default: &'a String) -> &'a String {
   match map.get(key) {
       Some(val) => val,
       None => default,
   }
}
let default = "DEFAULT".to_string();
let mut map = HashMap::new();
map.insert("Hello".to_string(), "World".to_string());
let value;
{
    let key = "missing".to_string(); // `key` allocated here
    value = get_default(&map, &key, &default);
} // `key` dropped here
// `map`, `default`, and `value` still valid
println!("{}", value);
}

It doesn't compile, even though the return value is a reference to either something in map or default, both of which live long enough! However, we said that the return type was &'a String and key is also &'a String, meaning the return value has the same lifetime as key. However, we know that the return value will never be a reference to key, so we don't care about the lifetime of key. So let's remove the 'a from key's lifetime. This gives us the following, which compiles!

#![allow(unused)]
fn main() {
use std::collections::HashMap;
fn get_default<'a>(map: &'a HashMap<String, String>, key: &String, default: &'a String) -> &'a String {
    match map.get(key) {
        Some(val) => val,
        None => default,
    }
}
}

Indeed, the example that broke earlier now works too, since default and map have the same lifetime and the lifetime of key doesn't matter.

#![allow(unused)]
fn main() {
use std::collections::HashMap;
fn get_default<'a>(map: &'a HashMap<String, String>, key: &String, default: &'a String) -> &'a String {
    match map.get(key) {
        Some(val) => val,
        None => default,
    }
}

{
    let default = "DEFAULT".to_string();
    let mut map = HashMap::new();                         // ---------+-- 'a
    map.insert("Hello".to_string(), "World".to_string()); //          |
    let value;                                            //          |
    {                                                     //          |
        let key = "missing".to_string();                  // -+-- 'b  |
        value = get_default(&map, &key, &default);        //  |       |
    }                                                     // -+       |
    println!("{}", value);                                //          |
}                                                         // ---------+
}

Lifetimes in Structs

Lifetime annotations are also necessary when containing references in a struct rather than owning the type outright. Conceptually, the ideas are exactly the same as everything we've seen with lifetimes so far, just some new syntax. Consider the following struct.

#![allow(unused)]
fn main() {
struct StringHolder {
    string: &str
}
}

rustc immediately complains that we don't have a lifetime and also tells us how to fix it.

#![allow(unused)]
fn main() {
struct StringHolder<'a> {
    string: &'a str
}
}

When it comes to impl blocks or instantiating StringHolder, the syntax is just like with generics. While the ideas are the same as before, they are still tricky and require some thought. Consider the following code that fails to compile.

struct TwoStrings<'a> {
    str1_ref: &'a str,
    str2_ref: &'a str,
}

fn main() {
    let str1 = "Hello".to_string();
    let outer;                          // ----------+--'a
    {                                   //           |
        let str2 = "World".to_string(); // --+-- 'b  |
        let two_strings = TwoStrings {  //   |       |
            str1_ref: &str1,            //   |       |
            str2_ref: &str2,            //   |       |
        };                              //   |       |
        outer = two_strings.str1_ref;   //   |       |
    }                                   // --+       |
    println!("{}", outer);              //           |
}                                       // ----------+

Why does it fail to compile? Yes str2 doesn't live long enough, but str1 does which is all we are referencing. However, notice that the lifetimes of str1_ref and str2_ref are the same. In this example, since str2 only has lifetime 'b, both str1_ref and str2_ref have lifetime 'b since we said they had the same lifetime. This means that neither lives long enough for outer. In reality, we know that str1 lives longer than str2, so we can fix this by introducing another lifetime.

#![allow(unused)]
fn main() {
struct TwoStrings<'a, 'b> {
    str1_ref: &'a str,
    str2_ref: &'b str,
}
}

Now str1_ref and str2_ref don't need to have the same lifetime.

Conclusion

Lifetimes are tricky and a very deep subject. If you are doing any systems programming, you have to deal with lifetimes, whether you realize it or not. Like mutability, ownership, and typing, Rust just forces you to think about lifetimes and do the right thing and has your back. Most of the time you get in a fight with the borrow checker it is because you are trying to do something that you are used to from C or C++ that works, but is very dangerous and really shouldn't be done.

Aliasing

Imagine you are a C compiler and you see the following code.

#include <stdio.h>

int foo(int *a, int *b) {
  int x = *a;       // read *a from memory and assign it to `x`
  *b += 3;          // increment *b by 3
  return x + *a;    // read *a from memory again and add it to `x`
}

How might you optimize it? Well, reading from memory is slow, so for the final return statement, you can just double x rather than fetch what's at the address of a again. Putting this code in Godbolt with optimizations set to -O3, this is the assembly we get.

foo:
        mov     eax, DWORD PTR [rdi]    ; x = *a
        add     DWORD PTR [rsi], 3      ; *b += 3
        add     eax, DWORD PTR [rdi]    ; return x + *a
        ret

Recall that ; is used to start a comment in assembly.

Don't worry too much about the details of the assembly. The first line reads what's at the address of a into eax, i.e. x. The second line adds three to *b. Finally, the third line adds x to *a again, storing the result in eax. Note that it doesn't add eax to itself, or shift left or something. Rather it reads from memory again, despite optimizations being turned all the way up.

Doing the same thing with equivalent Rust code we find the following assembly.

#![allow(unused)]
fn main() {
pub fn foo(a: &i32, b: &mut i32) -> i32 {
    let x = *a;
    *b += 3;
    x + *a
}
}
example::foo:
        add     dword ptr [rsi], 3    // *b += 3
        mov     eax, dword ptr [rdi]  // x = *a
        add     eax, eax              // return x + x
        ret

This time, the first line adds 3 to *b, then the second line reads *a from memory into eax, and the final line adds eax to itself. Why was Rust able to perform this optimization but C couldn't?

What happens if we run the following code in C?

int x = 3;
foo(&x, &x);

Now a == b! When two pointers point to the same location in memory like this, the pointers are said to alias. When pointers alias, a lot of unexpected things can happen. In this case, the harmless *b += 3 will force us to read *a from memory again since it has changed! A lot of perfectly reasonable looking C functions break when pointers alias. Because of this edge case, compilers are prohibited from making further optimizations.

On the other hand, what happens in Rust when we try the same thing?

#![allow(unused)]
fn main() {
pub fn foo(a: &i32, b: &mut i32) -> i32 {
   let x = *a;
   *b += 3;
   x + *a
}
let mut x = 3;
foo(&x, &mut x);
}

error[E0502]: cannot borrow x as mutable because it is also borrowed as immutable

We get a compiler error, since we can't have a mutable reference and an immutable reference to the same object at the same time. Due to the aliasing rules in Rust, b can only ever be the only reference to b, since it's mutable. Thus if you have a mutable reference to something, it is impossible for you to break anything else. On the other hand, since a is an immutable reference, there cannot be any mutable references to a at the same time, meaning a can't possibly change. Thus it's safe to just grab the value once from memory.

Connection To Lifetimes

Let's look at a seemingly unrelated example of lifetimes gone wrong in C++.

#include <iostream>
#include <vector>

int main() {
  std::vector<int> x = {1, 2, 3};
  const auto &y = x[1]; // `y` is a *const* reference to `x[1]`

  std::cout << y << '\n'; // 2

  x.clear();

  x.push_back(5);
  x.push_back(5);

  std::cout << y << '\n'; // 5!?
}

Here we make a vector, then get a const reference to the first element and print it out, yielding 2 as expected. Then we clear the vector, leaving us with a dangling reference, and push 5 on to the vector twice and print out the const reference again, and this time get 5 (though this is undefined behavior, so don't be surprised if you get something different). This is a classic lifetime issue; the reference y outlived the object it refers to.

However, the error message we get in Rust for attempting this is not quite what we expect.

fn main() {
    let mut x = vec![1, 2, 3];
    let y = &x[1];

    println!("{}", y);

    x.clear();

    println!("{}", y);
}

error[E0502]: cannot borrow x as mutable because it is also borrowed as immutable

This was the exact error message we got when aliasing references. Recall that in x.clear(), clear is a function taking &mut self, meaning it takes a mutable reference to x. However y is also a reference to data owned by x, meaning these references alias and Rust's aliasing rules apply. Since y is an immutable reference to data owned by x, we can't borrow from it again mutably, which is why this fails to compile. So it turns out that the problem isn't actually a lifetimes issue, but an aliasing issue. Rust's reference aliasing constraints turn out to solve lifetime issues in addition to aliasing issues.

On the other hand, if, rather than mutate x to invalidate a reference, we move out of x, then we get a different error message.

fn main() {
    let x = vec![1, 2, 3];
    let v = &x[1];

    println!("{}", v);

    fn eat_vector(_x: Vec<i32>) {} // takes ownership of `_x` and immediately
	                               // drops its data
    eat_vector(x);

    println!("{}", v); // uh-oh, what is `v` referencing now?
}

error[E0505]: cannot move out of x because it is borrowed

When ownership is transferred, any references will be referring to the wrong location, so Rust prohibits moving out of anything with active references.

In particular, this means that removing the println!("{}", v) at the end of this example causes it to compile, as the reference v is never used after x is moved.

An interesting side effect of this is that when the new owner acquires ownership, there cannot be any references to it, so it's fine if the new owner is mutable even when the original wasn't. This means the following code works just fine, despite the fact that x is declared as immutable.

fn main() {
    let x = vec![1, 2, 3]; // `x` is immutable
    let v = &x[1];

    println!("{}", v);

    fn eat_vector(mut vec: Vec<i32>) { // takes ownership of `vec`,
	                                   // declaring it mutable
        vec.push(0);                   // mutate `vec`
        println!("ate {:?}", vec);
    }

    eat_vector(x); // "ate [1, 2, 3, 0]"
}

~

Summary

  • Every object in Rust has a lifetime which measures how long the object lasts.
  • References cannot have a longer lifetime than the object they refer to, which is enforced by the compiler.
  • Just like with types, you can annotate lifetimes to help out rustc when things get ambiguous.
  • Rust's aliasing rules, discussed in week 2, enable some powerful optimizations that C/C++ can't.
  • Additionally, Rust's aliasing rules actually solve some lifetime issues.