Lab 5: Borrowing Frames

Due March 21, 11:59pm

Thus far, we've been been building up our higher level abstractions somewhat carelessly by copying a lot of data around. For example, decoding an array of two strings already requires three heap allocations for the Vec and two String values. But this is a systems programming language, and that means we get maximum control, and for the purpose of this lab, that is what we're after.

In this lab, we'll be using our knowledge of borrowing and lifetimes to make the Frame type borrow data instead of cloning bytes into its own heap allocations. This will involve several infrastructural changes to our existing code base.

Grading Rubric

  • Code is formatted properly using cargo fmt: 5%
  • Code passes cargo clippy without warnings: 10%
  • Code passes our tests: 60%
  • Responses in questionnaire.md: 25%

Table of Contents

Updating the definition of Frame

Next, we're going to update the Frame type to use borrowing types (&str, &[u8]) instead of owned allocations (String, Vec<u8>). As a reminder, &str and &[u8] are called slices: they are just pointers-length pairs that borrow from the owner of those bytes.

Rust's `&str`, `String`, and many other strings types versus C's `char *` type.

We'll start be revisiting the Frame type from week 3, which should look something like this:

Filename: src/frame.rs

pub enum Frame {
    Simple(String),
    Error(String),
    Integer(i64),
    Bulk(Vec<u8>),
    Array(Vec<Frame>),
}

In order to make it borrow its data instead of cloning bytes into its own owned heap allocations, we'll replace instances of String with &str and Vec<u8> with &[u8]. Make sure to introduce a lifetime to the Frame type so the compiler is happy. Also, consider why we can't make Array be a &'a [Self].

Updating the implementation of Frame

Running cargo check, we can see that we've now broken everything.

error[E0308]: mismatched types
  --> src/frame.rs:58:33
   |
   |                 Ok(Self::Simple(string))
   |                    ------------ ^^^^^^
   |                    |            |
   |                    |            expected `&str`, found struct `String`
   |                    |            help: consider borrowing here: `&string`
   |                    arguments to this enum variant are incorrect

Most of the time, the compiler is very helpful. That is not the case here, and demonstrates that although the compiler is very good at solving simple problems, it cannot read your mind.

What we actually want to do is remove the .to_string() and .to_vec() calls on the values we get from from_utf8 and slice respectively, since those were the functions that cloned the data from the cursor into our own owned heap allocations. Removing these calls from within the Simple, Error, and Bulk cases will make these "expected ... found ..." error messages go away.

... but we'll still have some errors that we need to take care of:

error[E0726]: implicit elided lifetime not allowed here
  --> src/frame.rs:15:6
   |
   | impl Frame {
   |      ^^^^^ expected lifetime parameter
   |
   = note: assuming a `'static` lifetime...
help: indicate the anonymous lifetime
   |
   | impl Frame<'_> {
   |           ++++

For this particular case, the compiler tells us to change it to Frame<'_> (where '_ is an anonymous lifetime). Let's see what happens if we do this:

error: lifetime may not live long enough
  --> src/frame.rs:78:20
   |
   |     pub fn decode(src: &mut Cursor<&[u8]>) -> Result<Self, ParseError> {
   |                                    -          ------------------------ return type is Result<Frame<'2>, error::ParseError>
   |                                    |
   |                                    let's call the lifetime of this reference `'1`
...
   |                 Ok(Self::Bulk(data))
   |                    ^^^^^^^^^^^^^^^^ argument requires that `'1` must outlive `'2`

If you have no idea what this means, then you're in good company. Although this error message looks extremely daunting, it is actually guiding us in the right direction this time. Let's briefly go over the problem here, but don't get too caught up on the details if you have trouble following.

The Self type is just an alias for the thing being impld, Frame<'_> in this case, where '_ could be any lifetime. When we accept a &mut Cursor<&[u8]>, omitting a lifetime on the &[u8] part also means that the inner &[u8] could have any lifetime. But this isn't okay, because if the lifetime in Self were to be longer than the lifetime of the &[u8], then it would be allowed to live for longer than the &[u8], which could lead to a dangling reference.

Therefore, we need to tell Rust "hey, these are going to have the same lifetime". To do this, we need to specify that Self is bound to the same lifetime that &[u8] is bound to, since they'll both be storing a pointer into the same owner wherever that is.

To do this, we'll need to introduce a named lifetime like 'a (as opposed to an anonymous lifetime '_). Since we're already using the Self type, and since it aliases the type that is being impld, let's change it from impl Frame<'_> at the top to impl<'a> Frame<'a> in order to introduce the 'a lifetime into scope.

If we had done this at the start, we would have gotten a much nicer compiler error for the definition of Frame::decode:

error[E0621]: explicit lifetime required in the type of `src`
  --> src/frame.rs:78:20
   |
   |     pub fn decode(src: &mut Cursor<&[u8]>) -> Result<Self, ParseError> {
   |                        ------------------ help: add explicit lifetime `'a` to the type of `src`: `&mut std::io::Cursor<&'a [u8]>`
...
   |                 Ok(Self::Bulk(data))
   |                    ^^^^^^^^^^^^^^^^ lifetime `'a` required

This compiler error is vastly more informative and tells us exactly how to fix this particular problem. Note that you don't need to specify the full path to Cursor since it's already imported at the top.

Once you incorporate the compilers feedback, your src/frame.rs file should be free of compiler errors. That wasn't so bad, was it?

Helper functions

In the last and final lab, we'll be building up the chat server logic, and doing so will require you to deconstruct and validate these frames. If we're expecting a frame that is an array of some other stuff, what might this look like?

fn parse<'a>(frame: Frame<'a>) -> Option<&'a str> {
    let array = if let Frame::Array(x) = frame {
        // evaluate to the inner array
        x
    } else {
        // short circuit the function
        return None;
    };

    // and continue doing this...
}

Read about if-let syntax here.

We don't want to manually write return None; on each failure case; instead, we want to make each failure case into something we can use ? on.

To do this, we'll define several methods on Frame: as_simple, as_error, and so on for each variant. Each of these will take in &self, and return Option<{variant data}>, where the data is the value stored in that variant. As an example, here's what as_array might look like:

impl<'a> Frame<'a> {
    // -- snip --

    pub fn as_array(&self) -> Option<&'a [Self]> {
        if let Frame::Array(array) = self {
            // array is `&Vec<Self>`, but can coerce into `&[Self]`.
            Some(array)
        } else {
            None
        }
    }
}

This will greatly simplify our code next week as we can use the ?-operator on Option values:

fn parse<'a>(frame: Frame<'a>) -> Option<&'a str> {
    let array = frame.as_array()?;

    // more logic here...
}

Problems with FrameReader::read_frame

Running cargo check again will reveal more compiler errors in src/rw.rs, and will helpfully tell us to run rustc --explain E0502:

A variable already borrowed as immutable was borrowed as mutable.

The error message seems simple, but is actually preventing us from much bigger issues, and explaining this is left as an exercise for the writeup.

The "quick fix" would be to remove the .consume(...) call on the ReadBuf, which makes the error go away, but not the logic is fundamentally flawed since we'll never be "throwing out" bytes that we're done with. Instead, we want to be able to get a Frame, and then somehow call .consume(...) when we're done with it.

The C solution: Manual consumption

One solution we might consider here is to make a public method on the FrameReader type that internally consumes bytes from the ReadBuf and trust that callers are responsible with it to use it correctly. This would be an excellect solution if we were writing C code, but we live in the 21st century and are trying to create an abstraction here.

Can we do better?

The JavaScript solution: Closures

Let's take a quick look at the broken code:

impl FrameReader {
    // constructor omitted

    pub fn read_frame<'a>(&'a mut self) -> Result<Frame<'a>, ReadError> {
        // frame validation omitted

        let frame = Frame::decode(&mut cursor)?; // 1

        // 2

        self.buf.consume(len); // 3
        Ok(frame) // 4
    }
}

At the point 1 we create a Frame that immutably borrows from self.buf. But then at point 3, we require mutably borrowing self.buf in order to mark len bytes as consumed. If the compiler were able to determine that frame isn't accessed beyond point 3, then it would compile this code because our immutable reference is never used after 3.

Here's a simplified example of the compiler being smart and doing this for us:

#![allow(unused)]
fn main() {
let mut x = 5;
let y = &x; // x borrowed immutably
println!("{y}");
// y isn't used past this point, so the compiler can be clever
x += 20;
}

This isn't the case though because frame is returned at point 4, meaning that point 3 might mutate what frame is referencing.

However, if we were able to do everything we needed with frame at point 2, then we could be finished with frame by 3 and the code would compile. One way to accomplish this is to have the caller pass in an arbitrary function that takes a Frame, and then call it at point 2, allowing us to then call .consume(...) and have the function compile.

If we wanted to take this more functional approach, we could make use of closures and the Fn, FnMut, and FnOnce family of traits, which allow us to place bounds on generic types that must be callable.

Using closures in this context might look like this:

impl FrameReader {
    // constructor omitted

    pub fn read_frame<'a, F>(&'a mut self, f: F) -> Result<(), ReadError>
    where
        F: FnOnce(&Frame<'a>),
    {
        // frame validation omitted

        let frame = Frame::decode(&mut cursor)?; // 1

        f(frame); // 2

        self.buf.consume(len); // 3
        Ok(()) // 4
    }
}

Notice how the function now returns (), since we've already done everything we need inside of f.

While passing in a function might seems like suitable behavior, it can lead to an incredibly unergonomic API for users. For example, any code inside a closure cannot ? to short circuit from the calling function:

fn do_stuff(reader: &mut FrameReader<TcpStream>) -> Result<(), ReadError> {
    reader.read_frame(|frame| { // <- argument is a closure
        // Can't do this
        let result = some_parse_function(frame)?;
        println!("{}", result);
    })
}

A more detailed example of me getting owned on this particular topic can be found in this GitHub thread.

The Rust solution: Guards

What we're really after is a way to pass back a Frame to the user so they can do whatever they want, and have .consume(...) be called on the buffer whenever the Frame isn't used anymore.

As it turns out, this is an extremely common pattern in Rust: borrow a resource, and have some cleanup code run when the borrow is no longer used. This is exactly how Rust's std::sync::Mutex<T> type works as well. A Mutex<T> wraps the data that it protects, and locking it returns a MutexGuard<'_, T>, which borrows from the Mutex and provides unique access to the inner T. When the MutexGuard is dropped, its destructor (specified by its std::ops::Drop implementation) unlocks the mutex, meaning that Rust programmers can never accidentally forget to unlock a mutex after they're done with it because it happens automatically.

Defining our own Guard type for FrameReader::read_frame

Let's use the guard pattern described above to update FrameReader::read_frame method. We'll start by stating the desired behavior:

  1. FrameReader::read_frame returns an error if Frame::check fails.
  2. If it succeeds, then the bytes in the buffer encode enough bytes for at least 1 valid frame, and it will return to us a guard that borrows the inner ReadBuf.
  3. The guard can give us a Frame whose data borrows from the ReadBuf that it's borrowing.
  4. When the guard goes out of scope, it will call .consume(...) on the ReadBuf with the correct number of bytes.

You should start by defining a Guard type along with a Drop implementation:

Filename: src/rw.rs

// other stuff omitted

pub struct Guard {
    // todo
}

impl Drop for Guard {
    fn drop(&mut self) {
        // todo
    }
}

The whole point of having a Guard type was so we could call .consume(...) on the buffer when the guard is dropped, meaning that Guard will need to carry a mutable reference to the ReadBuf and also the number of bytes to consume. This will require two fields, and you'll also need to introduce a lifetime to the Guard type as well.

Once you add these and implement the Drop implementation to consume the bytes, we'll need a way to actually get a Frame from a Guard, so let's add a method to do that as well:

impl Guard<'_> {
    pub fn frame(&self) -> Frame<'_> {
        // todo
    }
}

Implementing this is relatively straight-forward. For the &[u8], we'll access the data from the ReadBuf we're borrowing from with the .buf() method. We can use it to create a Cursor<&[u8]>, and then pass in a mutable reference of it to Frame::decode to get the Frame. Since we already checked that the full frame is there, we can call .expect("frame was checked") on the returned Result, which will panic if Err is returned.

Updating FrameReader::read_frame

Now that we have our Guard type, change the implementation of FrameReader::read_frame so that after it checks that there are enough bytes, it gets the number of bytes that the cursor has read so far with .position() as usize and returns a Guard that will .consume() that many bytes when it is dropped.

As a hint, the method type signature should look like this:

impl<R: Read> FrameReader<R> {
    // `new` omitted

    pub fn read_frame(&mut self) -> Result<Guard<'_>, ReadError> {
        // omitted
    }
}

One more Frame variant...

In the next lab, we'll be writing Frames back into Write types a lot, and our messages will be made up of arrays of other frames. In order to avoid having to allocate a Vec<Frame> to create an Array variant, we'll introduce the Frame::Slice variant, which is exactly like Frame::Array except it contains a &'a [Frame<'a>] instead of a Vec<Frame<'a>>. We'll also update WriteFrame::write_frame in src/rw.rs to add another branch in the match statement that functions identically to the Frame::Array branch. There are definitely better ways to encode this, but this solution is fine for now since it is very simple.

What this means is that whenever we receive a Frame that contains an array, we'll get the Array variant, but whenever we want to write back a frame, we can use the Slice variant instead since we won't have to perform a heap allocation.

Testing your code

For this lab, you should write more tests in src/frame.rs. You will also want to update your old tests to be compatible, but this shouldn't be much work.

Questionnaire

In questionnaires/, create a lab5.md file and answer the following questions:

# Questions

## Why borrowing?
In this lab, we added almost no new capabilities to our program, but instead just moved things around and changed some types.
What benefits does this lab bring?
Consider factors like heap allocations and copying data around.

### Response
TODO

## What is a guard?
In this lab, we introduced the concept of _guards_.
In your own words, what do guards allow us to do?
More importantly, what do they (helpfully) prevent us from doing?

### Response
TODO

Feedback

Please make sure that each member fills out this short feedback form individually.

Submitting

Once you're finished, you can push your changes to your git repository after ensuring that:

  • cargo fmt has been run
  • cargo clippy passes
  • cargo test passes

Congratulations on finishing!