Class 2: Ownership and Borrowing

Here's some Rust code that looks perfectly safe. However, it doesn't compile.

fn print_string(s: String) {
    println!("{s}");
}

fn main() {
    let hello = "Hello, World".to_string();
    print_string(hello);
    print_string(hello);
}

The reason this code doesn't compile has to do with an important concept in Rust called ownership that we will be exploring in this lesson. Despite the fact that this code is apparently perfectly safe, Rust disallows it. Rust has very strict rules regarding what kind of code is allowed, but these rules allow Rust to prevent all sorts of memory errors, like memory leaks, double frees, and use after free.

Ownership and Dropping

A key part of how Rust accomplishes this is called its ownership model. In Rust, every value has a unique owner.

This isn't technically 100% true. Sometimes you really do need shared ownership, and Rust has mechanisms in place for that we will discuss later, but the vast majority of time every value has a unique owner.

When that owner goes out of scope, the value is dropped, and code is run to clean up any allocated resources. This pattern, called Resource Acquisition Is Initialization (RAII), may sound familiar from C++, where destructors can automatically clean up resources.

#![allow(unused)]
fn main() {
{
    // allocates memory on the heap for `s`
    let s = "this is a string".to_string();
} // when `s` goes out of scope, that memory is automatically freed
}

Where Rust differs from C++ is that this model is enforced by the compiler on every value. So while, C++ can work just like Rust in this regard,

{
    // allocates memory on the heap for `s`
    std::string s = "this is a string";
} // when `s` goes out of scope, the destructor is called, freeing
  // the memory

it is ultimately just a convention that you can ignore.

{
    // manually allocate memory on the heap
    char *s = new char[17];
    strcpy(s, "this is a string");
} // uh-oh, memory leak!

Even the CS35 lab 2 website calls "delet[ing] any dynamically allocated array" a style requirement.

Image of coding style requirements from CS35, including "Always delete any dynamically allocated array ... using the delete[] command"

Analogy

As an analogy, imagine a storage facility. Every thing in the storage facility has a unique owner. This owner signed a contract with the storage facility, promising to clean everything they own up when the contract expires.

Moving

This model is all well and good if we only have one variable, but things get more complicated as we add multiple variables.

For example, this code below compiles and works just fine, despite the fact that based on what we've learned so far it shouldn't.

#![allow(unused)]
fn main() {
{
    // declare a `String`
    let s1;
    {

        // allocate memory on the heap for `s2`
        let s2 = "Why does this work?".to_string();

        s1 = s2;

    } // `s2` goes out of scope, so the string should
      // be dropped, right?

    println!("{}", s1); // but it's still good here
}
}

Rust allows ownership to transfer. The primary way it does this is by moving a value. To see how this works, let's look at what happens in the above code when we say s1 = s2. A String in Rust is basically a pointer to some characters, a length, and a capacity. When we say s1 = s2, these three things are copied over from s2 into s1. At this point, both s1 and s2 have pointers to the same data, muddying ownership. Importantly, Rust then invalidates s2 to avoid having two variables own the same data. Trying to use s2 after s1 = s2 is a compile error.

What is really clever about this is, in addition to guaranteeing unique ownership, we never needed to copy the actual string. Whether the string is 10 characters or 1,000,000, the move will always take the same amount of time.

Compare this to roughly equivalent C code, where ownership isn't tracked.

char *s1 = malloc(17); // allocate s1
strcpy(s1, "This is a string"); // give it the value "This is a string"

char *s2 = s1; // copy the pointer over to s2

puts(s1); // "This is a string"
puts(s2); // "This is a string"

free(s1); // Uh-oh, who owns the data?
free(s2);

After copying the pointer, both references are still valid, meaning it is ambiguous who is responsible for freeing the data when they are done, unlike in Rust where it is clear who has ownership of the data.

Analogy

Continuing the storage facility analogy, if you have a ton of stuff in a storage unit that you want to give to a friend, rather than empty the storage unit, move everything to where they want it, you can just give them the key to your storage unit and fill out some paperwork to transfer the ownership. This way your friend owns the stuff, you don't need to physically move anything, the storage facility is happy, and only one person has access to the stuff.

Copying

This is nice, but can get really annoying for code like below.

#![allow(unused)]
fn main() {
let three = 3;
let four = three + 1;       // A

println!("3 = {}", three);  // B
println!("4 = {}", four);
}

Under the rules we've seen so far, ownership of the value 3 is passed from three to four on the line marked A, meaning we shouldn't be able to use three on the line marked B. This, however, is ridiculous! No one owns 3, it's just a number. To avoid this problem, primitive types like i32 or bool can be copied without invalidating the old value. Since no cleanup is required, we don't have the double free problem to worry about, like with String, and since these types are small enough, there isn't really anything more efficient you can do than just copying them.

When values can be copied rather than moved is a bit more complicated than this in Rust and involves something called the Copy trait. We will learn more about how this works in week 4.

Analogy

To continue the storage facility analogy, if what you are planning on giving your friend is very small, light, and common, it might be easier to just buy them a copy rather than deal with the storage facility. Then both people own a copy of the thing, which isn't a problem since it's so commonly available and isn't kept in a storage facility.

Borrowing

This is all very good and solves a lot of problems, but we are still quite limited with just moving and copying. For instance, suppose we want to implement a capitalize function in Rust to capitalize a String.

#![allow(unused)]
fn main() {
fn capitalize(s: String) -> String {
    s.as_str().to_uppercase()
}
}

To see in a bit more detail how this function works, check out the Rust docs here and here.

Right now, the only way for us to see what's in a String (like we would want to to be able to capitalize it), is to take ownership of it. However, since capitalize takes ownership of s, we end up losing the original string. This means seemingly innocuous code like this doesn't compile.

#![allow(unused)]
fn main() {
fn capitalize(s: String) -> String {
   s.to_uppercase()
}
let boring = "Hello World".to_string();
print!("{} ", capitalize(boring));  // `boring` moved here!
println!("is louder than {}", boring); // `boring` no longer valid!
}

Fortunately, Rust lets us borrow data rather than take ownership of it. We can do so via references. References are like pointers in that they refer to a value. However, within the Rust ownership model, they do not own what they refer to. This has important implications. For one, since references don't own what they refer to, the value they refer to isn't dropped when they go out of scope. Additionally, multiple references referring to the same thing can exist, since they don't own it.

This is perfect for our capitalize function, since we don't need to own s.

#![allow(unused)]
fn main() {
fn capitalize(s: &String) -> String {     // `s` is a reference
    s.as_str().to_uppercase()             // when `s` goes out of scope,
                                          // the original string isn't dropped,
                                          // since `s` is just borrowing it
}
}

A reference to a String can be automatically coerced into &str, which is generally better. Rather than create a pointer to a String (which is basically just a pointer to some characters anyway), we end up with the pointer to the characters directly. This means we can write capitalize as shown below and call it in exactly the same way.

#![allow(unused)]
fn main() {
fn capitalize(s: &str) -> String {
    s.to_uppercase()
}
}

Now our function capitalize just borrows s rather than consuming it. Now the following code compiles and does what we expect it to.

#![allow(unused)]
fn main() {
fn capitalize(s: &String) -> String {     // `s` is a reference
   s.to_uppercase()                      // when `s` goes out of scope,
                                         // the original string isn't dropped,
                                         // since `s` is just borrowing it
}
let boring = "Hello World".to_string();
print!("{} ", capitalize(&boring));  // now `capitalize` doesn't own `boring`
print!("is louder than {}", boring); // so we can use it here
}

What if, rather than return a new string, we want capitalize to modify the string passed in to it? We can try the following code, but it fails to compile. Note as well that, like C, we can use * to refer to the original data rather than the reference, which is necessary if we are trying to assign to it. However, unlike C or C++, there is no difference between . and ->, since Rust is smart enough to automatically dereference in those situations.

#![allow(unused)]
fn main() {
fn capitalize(s: &str) {      // borrow `s` here
    *s = s.to_uppercase()     // try to change it
}
}

The reason this doesn't compile is, like variables, references are immutable by default. We can make a reference mutable by specifically adding mut. Now the following code compiles.

#![allow(unused)]
fn main() {
fn capitalize(s: &mut String) { // borrow `s` MUTABLY here
    *s = s.to_uppercase()       // now we can change it
}
}

Here we do need String rather than str since we are dereferencing and assigning to it.

And to use it, we must create a mutable reference, rather than just a regular reference.

#![allow(unused)]
fn main() {
fn capitalize(s: &mut String) { // borrow `s` MUTABLY here
   *s = s.to_uppercase()       // now we can change it
}
let mut lowercase = "Hello World".to_string(); // create `lowercase`, which MUST BE MUTABLE
println!("{} is lowercase", lowercase); // use it
capitalize(&mut lowercase); // now we borrow `lowercase` MUTABLY
println!("and {} is uppercase", lowercase); // now `lowercase` is different
}

What's particularly useful about references, is since a reference must be created after the data it refers to, and a reference is guaranteed to go out of scope before the owner of the original data, the reference is guaranteed to be valid for the entire scope of the reference. This fixes problems in other languages with manual memory management where it's possible to have a pointer to data that has already been freed.

References are extremely important, both in Rust and in programming in general.

Analogy

Continuing the storage facility analogy, if you want to lend a friend some stuff in your storage unit, you can give them a copy of your key. This way you still own the storage unit, but your friend can also access it.

References to Chunks of Memory

One very common use of pointers in C/C++ is to point to contiguous chunks of memory. However, references can only refer to a single value so we need something new to refer to contiguous chunks of memory. First, we can create contiguous chunks of memory on the stack with arrays. Arrays work very similarly to arrays in other languages, and have type [T; N], where T is the type contained in the array and N is the length.

#![allow(unused)]
fn main() {
let one_two_three: [i32; 3] = [1, 2, 3];
let five_threes: [i32; 5] = [3; 5];
}

We can write functions that take ownership of arrays, but this is generally a bad idea for a few reasons. First, that will require copying or moving the entire array, which is often very expensive. Second, arrays of each different length are different types, requiring different functions for each length.

#![allow(unused)]
fn main() {
fn average_of_three(arr: [f32; 3]) -> f32 {
    arr.iter().sum::<f32>() / 3.0
}

fn average_of_four(arr: [f32; 4]) -> f32 {
    arr.iter().sum::<f32>() / 4.0
}
}

There is a way around having to create all these functions by hand with const generics, which were recently added to Rust stable. This is still a really bad idea, though, since you really don't want to take ownership of the array.

Fortunately, Rust has a mechanism that fixes both problems, namely slices. A slice is a reference to a contiguous chunk of memory that boils down to a pointer and a length. This way we can borrow an entire chunk of memory without having to move or copy it.

#![allow(unused)]
fn main() {
fn average(slice: &[f32]) -> f32 {
    slice.iter().sum::<f32>() / slice.len() as f32
}
}

Note here that the type of slice is &[f32] with no mention of the length. Unlike arrays, length is not part of the type of a slice. After all, if a slice is just a pointer and a length, the size of a slice is always the same, regardless of how large the chunk of memory it points to is. This fixes the second problem with taking arrays we had.

If we want to use this function, we have to create a slice, which we can do in several ways. One way is to just borrow the whole array.

#![allow(unused)]
fn main() {
fn average(slice: &[f32]) -> f32 {
   slice.iter().sum::<f32>() / slice.len() as f32
}
let arr = [1.0, 2.0, 3.0, 4.0];
println!("{}", average(&arr));
}

The fact that they are called "slices" suggests that we can take smaller slices of an array than the entire thing, which is true.

#![allow(unused)]
fn main() {
fn average(slice: &[f32]) -> f32 {
   slice.iter().sum::<f32>() / slice.len() as f32
}
let arr = [1.0, 2.0, 3.0, 4.0];
println!("{}", average(&arr[1..3])); // takes indices 1 through 3, not including 3
println!("{}", average(&arr[1..]));  // takes everything after index 1 
println!("{}", average(&arr[..3]));  // takes everything up to but not including index 3
println!("{}", average(&arr[..]));   // takes everything, equivalent to just &arr
}

As usual, there's a lot more detail about slices in the Rust book here.

Conclusion

For more details on Rust's ownership model specifically, see this chapter of The Book.

~

Summary

  • Every piece of data has a unique owner.
  • When that owner goes out of scope, the data is dropped.
  • Ownership can be transferred by moving or copying the data.
  • Data can also be borrowed via references to avoid unnecessary copying/moving.
  • References are guaranteed at compile time to always be valid.
  • Slices are references to contiguous chunks of memory.
  • You can't borrow something if it is already mutably borrowed, guaranteeing immutability.