Lab 3: Parsing Frames
Due February 28, 11:59pm
As stated on the course page, one of the goals of the course is to build a chat server, and that starts now. We will spend the remaining labs building up small components that we will piece together at the end.
Networking applications such as chat servers rely on being able to communicate between computers, and this happens by sending bytes between them, which are just streams of 1s and 0s. By themselves, bytes are meaningless, but if both computers agree on a protocol for how the bytes are structured, then they can derive meaning from them and communicate. An analogy to this is how making monkey noises isn't an effective mode of communication, but speaking a shared language is.
In this lab, we'll be (partially) implementing a protocol which will allow us to parse unstructured bytes into flexible higher-level structures that will be much easier to work with later on.
Since bytes are unstructured, however, there are many things that might go wrong during parsing. It might tell us to read in the next integer represented in ASCII digits, but then we find a non-digit character. Or it might tell us to read the next 100 bytes as an array of bytes, but there's only 50. As such, one of the focuses of this lab is to practice writing robust, fallible code with useful error types.
Grading Rubric
- Code is formatted properly using
cargo fmt
: 5% - Code passes
cargo clippy
without warnings: 10% - Code passes our tests: 60%
- Responses in
questionnaire.md
: 25%
Learning Objectives
- Practice defining and using enum types.
- Write fallible code with elegant error handling.
Table of Contents
- Defining the
Frame
type: How to defineFrame
. - The RESP protocol: Specification for how you should be parsing frames from bytes.
- Implementing methods for the
Frame
type: The "what, why, and how" for methods onFrame
. - The
ParseError
type: Error enum for this lab. - Cursor helper functions: How to get bytes from
&mut Cursor<&[u8]>
. - Get started with a
match
expression: Start here if you're stuck! - Helpful conversions: How to get
Vec<u8>
andString
from&[u8]
. - Testing your code: Reminder to write tests.
- Questionnaire: Answer some brief questions about Rust.
- Feedback: Give us your feedback!
- Submitting: How to submit your lab.
Defining the Frame
type
We want to communicate with structured messages, and will do this via the RESP protocol. In short, RESP is an easily parsable, human readable message format. RESP allows for sending various kinds of messages called frames, which can be one of the following variants:
- Simple UTF-8 strings (
String
) - Error messages as UTF-8 strings (
String
) - 64-bit signed integers (
i64
) - Bulk strings, which are arbitrary bytes (
Vec<u8>
) - Vectors of other frames (
Vec<Frame>
)
Since frames can take on any one of these variants, it's the perfect candidate for declaring as an enum, which is the first task.
Begin by opening your project and filling in the variants for the Frame
type, one for each bullet point above.
Make sure that each variant can carry the data associated with it, e.g. you should have a Simple(String)
variant.
Filename: src/frame.rs
pub enum Frame {
Simple(String),
// todo: fill in variants
}
The RESP protocol
Since we're building a basic networking application from the ground up, we're going to write the parsing code that will convert raw bytes into Frame
objects that we can use later on.
This section will tell you how each variant is encoded in bytes.
- RESP Simple Strings:
- Leading byte is "
+
", followed by a string that cannot contain a\r
or\n
character, and terminated by\r\n
. - Example: "I love Rust 🦀" becomes "
+I love Rust 🦀\r\n
".
- Leading byte is "
- RESP Errors:
- Same as simple strings, but the leading byte is "
-
". - Example: "NOT FOUND" becomes "
-NOT FOUND\r\n
".
- Same as simple strings, but the leading byte is "
- RESP Integers:
- Leading byte is "
:
", followed by a\r\n
-terminated string representing an integer. - Example: 123 becomes "
:123\r\n
".
- Leading byte is "
- RESP Bulk Strings:
- Leading byte is "
$
", followed by the number of bytes composing the string in ASCII digits (a prefixed length), followed by\r\n
, followed by the actual bytes, followed by a final\r\n
. - Example: "Æ’oo" becomes "
$4\r\nÆ’oo\r\n
", since "Æ’" is encoded in 2 bytes and "oo" is 2 bytes, one for each "o". Yes, characters can be variable length encoded. No, we won't be diving into this rabbit hole.
- Leading byte is "
- RESP Arrays:
- Leading byte is "
*
", followed by the number of frames in the array in ASCII digits, followed by\r\n
, followed by the encodings of the frames within the array. - Example: ["hello", 100] becomes "
*2\r\n+hello\r\n:100\r\n
".
- Leading byte is "
When you start writing your Frame::check
method (defined below), you should be consulting this section so you know how to properly parse data from the bytes.
Implementing methods for the Frame
type
The main functionality we'll be implementing in this lab is parsing Frame
values from a slice of bytes (&[u8]
), as described by the protocol.
Instead of reading straight from a &[u8]
, however, we'll read from a &mut Cursor<&[u8]>
instead.
Let's break down this type signature.
The &mut
part is self explanatory- we're getting a mutable reference to a Cursor<&[u8]>
.
The Cursor
part is a type provided by the standard library as std::io::Cursor
, and it looks like this:
struct Cursor<T> {
inner: T,
pos: u64,
}
So when we have Cursor<&[u8]>
, we're essentially just pairing a byte slice, &[u8]
, with a position, u64
.
This means that it can keep track of how much we've read so far without changing anything about the underlying buffer.
Don't worry too much about the syntax or the name of the type for now, we'll be exploring it more next week.
One of the benefits of Rust is that you can view source code from the documentation. If you're curious about how
Cursor
works, you can see its documentation here, and even see its source code by pressing the orangesource
button on the right side.
Next, we need a way to take a &mut Cursor<&[u8]>
, read from it, and return a Frame
to us (or some kind of error).
One thing to know about communicating over a network is that even if a chunk of bytes is sent all at once, the receiving computer might not get them all at the same time.
To combat this, we'll create two methods, Frame::check
and Frame::decode
, where the former checks that a Frame
can be parsed without actually parsing it, and the ladder actually parses the bytes and returns a Frame
.
To see why we might want this, consider the following sequence of bytes (with whitespace added for readability):
*3\r\n
+GET\r\n
+Potato\r\n
This is a frame that is an array of three inner frames, but only two are present so far.
If we eagerly create a Frame::Array
with an empty Vec
and push two Frame::Simple
s to it, we'll find out that we don't yet have data and return an error, meaning our String
and Vec
allocations will have been for nothing.
Thus, having distinct Frame::check
and Frame::decode
methods will allow us to cheaply check that there is enough data without having to perform potentially wasted heap allocations if there turns out not to be.
Let's briefly walk through how these methods should be implemented.
Frame::check
Frame::check
ensures that the bytes are encoded as a valid frame as described below.
First, it checks that the first byte matches one of the expected bytes (+
, -
, :
, $
, *
), and performs more checks depending on the kind:
-
Simple strings and errors have a terminating
\r\n
, and have been checked as valid UTF-8. To get you started, you could implement this branch as justlet slice: &[u8] = line(src)?; let _validated: &str = from_utf8(slice)?; // validated, don't need to do anything else now
This should work because you should already have
use std::str::from_utf8;
at the top. -
Integers can be parsed as an
i64
and have a terminating\r\n
. -
Bulk strings have a valid prefix length and that there are that many bytes remaining, followed by
\r\n
. -
Arrays have a valid prefix length,
n
, and thatn
frames afterwards can also be checked withFrame::check
. This can be accomplished by loopingn
times and recursively callingFrame::check
in each iteration. For example, if we have this sequence of bytes in our&mut Cursor<&[u8]>
(the^
is where the cursor position is)*2\r\n+Hello\r\n+world\r\n ^
Then calling
Frame::check
on it will first detect that it's an array of 2 elements, while advancing the cursor forward past the first line:*2\r\n+Hello\r\n+world\r\n ^
It'll then loop twice, where the first iteration will check and advance past the first inner frame (the "Hello" simple string):
*2\r\n+Hello\r\n+world\r\n ^
and the second iteration will check and advance past the second and last inner frame (the "world" simple string):
*2\r\n+Hello\r\n+world\r\n ^
If any of the checks on the inner
n
frames returns anErr
, propogate it up with the?
operator.Note that because we're using a
Cursor
, each call toFrame::check
will advance the position of the cursor, so we don't have to manually track where in the buffer we are.
On success, it returns Ok(())
, which just tells us "hey, this operation succeeded but I don't have anything to give you."
On failure, it returns Err(ParseError)
.
The ParseError
type, which will be defined by you, is explained more here.
Frame::decode
Frame::decode
is nearly identical to Frame::check
, except it creates the Frame
objects as it goes instead of just throwing out bytes after validation.
For example, if you branch for simple frames in encode
looked like this:
let slice: &[u8] = line(src)?;
let _validated: &str = from_utf8(slice)?;
Then you would want this instead in the decode
method:
let slice: &[u8] = line(src)?;
let validated: &str = from_utf8(slice)?;
Ok(Frame::Simple(validated.to_string()))
You could also be clever and write it in one big expression if you feel comfortable with that too:
Ok(Frame::Simple(from_utf8(line(src)?)?.to_string()))
Another change would be that instead of just checking frames in a loop in the array variant, you'd actually want to be pushing the successfully parsed frames to a Vec
as you go.
Starter code
The outline for these functions is provided in the same file:
Filename: src/frame.rs
impl Frame {
pub fn check(src: &mut Cursor<&[u8]>) -> Result<(), ParseError> {
todo!("implement Frame::check")
}
pub fn decode(src: &mut Cursor<&[u8]>) -> Result<Self, ParseError> {
todo!("implement Frame::decode")
}
}
It is your task to implement these functions according to the RESP protocol and with proper error handling.
Speaking of error handling...
The ParseError
type
When writing up the Frame::check
and Frame::decode
methods, there are several things that can go awry.
For example, one of the cursor
helper functions might return a CursorError
, or the first byte of a frame might not be one of the expected bytes.
In these cases, you'll want to add a variant to the ParseError
type defined at the bottom of src/error.rs
.
To get you started, two possible things that might go wrong are one of the cursor
functions might return a CursorError
, or the first byte of a frame might not be one of the expected bytes as shown in the last branch above.
pub enum ParseError {
Cursor(CursorError),
FrameType(u8),
// add more
}
For a refresher on errors as enums and how to make your errors work with the ?
operator, read back on our lecture notes in 3.1. Enums.
Cursor helper functions
For help working with the Cursor
type, we've created a small library with utility functions which you can see in Cargo.toml
if you're interested.
Near the top of src/frame.rs
, you should see the following:
use cursor::{byte, integer, line, size, slice};
To learn how to use these, navigate inside your project directory and enter the following command:
cargo doc --open
This will generate documentation for your Rust project in the form of HTML, JS, and CSS, and then open the corresponding files as a web page on your default browser.
Under the Crates
section on the left-hand side, click on cursor
, and you should be brought to the documentation which will explain how each method works.
Get started with a match
expression
We've seen how match
expressions can be used to extract the value of an enum, but they can also be used like switch
statements in C/C++.
Here's an example that you can run for yourself:
fn main() { // We can specify a byte by prefixing a char literal with `b` // Try changing the byte! let byte: u8 = b'+'; match byte { b'+' => { // all good! println!("Plus!"); } b'-' => { // all good! println!("Minus!"); } x => { // hey, this isn't right and we can't proceed properly from here println!("None of the expected bytes matched, found: {}", x); } } }
You may find this useful when determining which kind of frame is encoded based off the first byte read.
If you're ever at a point in code where you're thinking "hey, this isn't right and we can't proceed properly from here", than it's probably an error and you should add a variant for the particular scenario on the ParseError
type so you can return it.
Helpful conversions
Converting &[u8]
to Vec<u8>
You can use the .to_vec()
method to create an owned Vec<u8>
from a borrowed slice, &[u8]
.
#![allow(unused)] fn main() { let line: &[u8] = &[1, 2, 3]; let bulk: Vec<u8> = line.to_vec(); }
Converting &[u8]
to String
You can use the std::str::from_utf8
function to get from &[u8]
to &str
(without any copying!), and then use .to_string()
to get from &str
to String
.
fn main() -> Result<(), std::str::Utf8Error> { use std::str::from_utf8; let line: &[u8] = &[240, 159, 166, 128]; let borrowed_string: &str = from_utf8(line)?; let owned_string: String = borrowed_string.to_string(); println!("Secret message: {}", owned_string); Ok(()) }
Testing your code
In lab 1, we had the following at the top of our lib.rs
file:
#[cfg(test)]
mod tests;
This created a tests
module inside of our project, which was accessible in tests.rs
.
For this lab, however, we'll be writing tests at the bottom of src/frame.rs
.
Near the bottom, you'll find the following:
Filename: src/frame.rs
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_check() {
// Use the following syntax to get your `Cursor<&[u8]>` buffers.
let mut src = Cursor::new(":100\r\n".as_bytes());
// `.is_ok()` returns true if `Result` is the `Ok` variant.
assert!(Frame::check(&mut src).is_ok());
}
}
This is essentially equivalent to the above (with a sample test though), but the contents of the tests
module can go in the block here instead of in another file.
Write some tests to check that your code works!
One of the reasons we might want to write tests in the same file is if we're building up a larger framework and don't want test files everywhere, but want to keep our tests close to what we're testing for clarity.
Questionnaire
The last part of this assignment is to fill our several short response questions.
Since we'll be reusing this repository for the rest of the course, you'll find the questionnaire in questionnaire/lab3.md
this time.
In future labs, we'll push other questionnaires to this directory, so be sure to not add other files or move files around.
Feedback
Please make sure that each member fills out this short feedback form individually.
Submitting
Once you're finished, be sure to verify that:
cargo fmt
has been runcargo clippy
passescargo test
passes
Then you can push your changes with the usual: git add .
, git commit -m "your message"
, and git push
.
However, there's one final step: Git tags.
In the industry, tags are how software is officially released. Essentially, it creates an immutable copy of the repository and all the contents and makes it fancy so people can download and use it. We'll be using it to denote stages of completion on this project, so one for each of the remaining labs.
The benefit of this is that when you create a git tag, we can look at it to see your repository at that moment in time. This means that if you're working on lab 4 and your repository is in an incomplete state that potentially doesn't compile, we can still look back at the point where you created a tag for this lab and see your working code from that point in time. It will also help us know exactly what code you want to submit- that is, if you need an extension, we'll know not to grade your code until you create a git tag.
So, when you're sure you're completely finished with the lab, start by creating a tag:
git tag lab3
And then push it to your repository:
git push origin lab3
Congratulations on finishing!