Kanidm[8]

July 19

Monday TODO list:

  • Change timestamp format.
  • Add UUID’s to logs.
  • Do some restructuring.
  • Design and implement JSON format structure.

I got feedback over the weekend! First things first: William and James (@yaleman) both agreed that the timestamp format had to go. Fortunately, chrono had a built-in method for formatting as RFC3339, which they strongly suggested. I wasn’t personally a fan of this change because the format was much less human readable and probably should’ve been reserved for JSON formatting, but they probably had good reason for it so I was happy to oblige.

Next up was to add UUID’s to individual operations. I wasn’t particularly stoked about this change because it was going to add way more clutter, but again: they probably knew better than I did, and it would be a fun exercise. I didn’t know how to do it at first, but I knew where to look: AuditScope.

I found that the AuditScope type received a Uuid, and allowed the programmer to explicitly pass in an existing UUID (although mostly it was freshly generated). I wanted to allow this behavior, so I drafted up how I imagined it would look like:

// Get a `Uuid` from somewhere...
let my_uuid = Uuid::new_v4();

// Pass it into the span
info_span!(uuid = my_uuid, "status_handler")

In this example, I had the uuid field as an optional field. If left blank, the UUID would be randomly generated if the span was a root, otherwise it would be adopted from the parent.

I quickly encountered an issue with this: You can only pass values through as numbers or as fmt::Debug trait objects. This was really frustrating, because it meant that I had to convert it into a string or into individual numbers and split it between multiple fields. There was no other way to pass a value in through tracing’s existing framework. I decided to start with the simple solution of formatting it to a String to get testing working, and later transform it to use bit manipulation to make it only 2 64-bit numbers.

This meant that using a custom UUID would now look like this:

info_span!(uuid = fields::display(my_uuid.to_hyphenated()), "status_handler);

I actually realized that this wasn’t so bad because I could just have it format the hyphenated representation into a [u8; 36] which is the size of all UUID’s, meaning no stack allocations.

As I was doc diving, I came across an interesting implementation detail. tracing allows you to use a closure as a visitor for new spans and events, allowing me to eliminate some of the boilerplace of a new Visit type when detecting if spans had a provided UUID. This was what I came up with:

let mut uuid = None;
attrs.record(&mut |field: &Field, value: &dyn fmt::Debug| {
    if field.name() == "uuid" {
        let mut buf = String::with_capacity(36); // all UUID's format to 36 chars
        write!(&mut buf, "{:?}", value).expect("Write failed");
        uuid = Some(buf);
    }
});

More digging through docs revealed I could use the instrument proc-macro to add a uuid field in the attribute macro call. Which meant I could assign UUID’s as follows:

#[instrument]
pub async fn handle_request(&self, uuid: Uuid) -> bool {
    admin_info!("status handler complete");
}

However, this only works when the argument is called uuid and the type is Uuid, or something whose fmt::Debug implementation produces the proper hyphenated UUID format. This is all built into instrument, which is great because I don’t have to do anything, but also not great because I have no control over implementation details.

Overriding this is much more verbose, but not terrible:

#[instrument(fields(uuid = tracing::fields::debug(event.eventid)))]
pub async fn handle_request(&self, event: StatusRequestEvent) -> bool {
    admin_info!("status handler complete");
}

For this reason, it’s much better to just pass in a uuid argument that debugs to the proper format.

My day was derailed when I ran into a very frustrating issue. I designed my spans so that if they’re given a UUID, they take that, else if they have a parent, they don’t take one (because they just use their parents’), else they generate a fresh UUID because they need one. However, the is_root method always returns false, suggesting that none of my spans are roots (even for the very first span I create). Yet, the is_contextual method returns true, meaning that the parent of the span is the “Current”. So I tried to find what the current span was with lookup_current, which returned None. Oh boy.

Decided to call it a day here because I was frustrated.

July 20

Tuesday TODO list:

  • Continue implementing UUID support.
  • Pick up todos from Monday.

Since Monday ended with me not understanding why tracing didn’t behave like I thought it did, I took the only reasonable approach of returning to the docs. As it turns out, Attributes::is_contextual doesn’t always mean that the thread is in a span, in which case the new span will be the root. Even though Attributes::is_root would return false for the same span. Huh?

This didn’t turn out to be an issue, though, because all it told me was that I needed to find another solution.

I ended up just combining Context::lookup_current to determine if there was a current span, and then combining that with Option::is_none to get if this span was a root in it’s trace tree. Using this an an indicator of whether to generate a fresh UUID worked flawlessly!

I was still troubled by how syntactically bloated it felt to give a span a custom UUID from its parents. I spent some time creating wrapper macros, but decided that the occurrances where we would need this would be minimal enough to the point where it wasn’t worth the time, so I abandoned this idea.

I spent some time refactoring and cleaning up subscriber.rs, since it was a big jumble of code. I was able to minimize a lot of repetition, and also allowed for singular events without any parent spans to be logged instead of just getting ignored. This probably shouldn’t be abused, but it would be nice for debugging purposes.

Another executive decision that I decided on was to move the emojis from the left column into the span tree on the right side. Here’s an example of what I mean (with a UUID and RFC3339 time format):

00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 TRACE    try_from_entry_ro [ 1ms | 28.209% / 100.000% ]
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 TRACE    ┕━ server::internal_search [ 895µs | 6.412% / 71.791% ]
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 INFO        ┝━ 💬 [request.info]: Some request info...
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 TRACE       ┕━ server::search [ 815µs | 0.538% / 65.379% ] 
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 TRACE          ┝━ be::search [ 535µs | 0.836% / 42.931% ]
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 TRACE          │  ┝━ be::search -> filter2idl [ 268µs | 18.743% / 21.500% ]
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 TRACE          │  │  ┝━ be::idl_arc_sqlite::get_idl [ 27µs | 2.221% ]
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 INFO           │  │  │  ┕━ 💬 [filter.info]: Some filter info...
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 TRACE          │  │  ┕━ be::idl_arc_sqlite::get_idl [ 6µs | 0.536% ]
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 ERROR          │  │     ┝━ 🚨 [admin.error]: An admin error occurred...
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 DEBUG          │  │     ┕━ 🐛 [_.debug]: An untagged debug log
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 TRACE          │  ┕━ be::idl_arc_sqlite::get_identry [ 256µs | 20.595% ]
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 INFO           │     ┝━ 🔐 [security.critical] A security critical log
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 INFO           │     ┕━ 🔓 [security.access] A security access log
00000000-0000-0000-0000-000000000000 2021-07-20T18:59:30.605201+00:00 TRACE          ┕━ server::search<filter_resolve> [ 273µs | 21.910% ]

It took me about 5 minutes to implement this, but the task took 15 minutes because I spent a lot of time admiring the pretty colors in VSCode. I was really satisfied with this because the colors really help distringuish events and spans. Time to add “UX Designer” to my resume :)

That concludes the suggested changes, so I pushed my changes to the repo there as a progress checkpoint.

After that, I decided to work on JSON formatting, which of course started with choosing how information should be represented. Here’s what I came up with (expanded for clarity):

Span

{
    "uuid": "00000000-0000-0000-0000-000000000000",
    "timestamp": "2021-07-20T18:59:30.605201+00:00",
    "level": "TRACE",
    "message": "try_from_entry_ro",
    "log-type": "span",
    "nanos-direct": 282,
    "nanos-total": 1000,
}

Event

{
    "uuid": "00000000-0000-0000-0000-000000000000",
    "timestamp": "2021-07-20T18:59:30.605201+00:00",
    "level": "INFO",
    "message": "Some request info...",
    "log-type": "event",
    "tag": "request.info",
    "spans": [
        "try_from_entry_ro",
        "server::internal_search",
    ]
}

The main fields like "uuid", "timestamp", "level", and "message" are all intuitive. The "log-type" field is present to help any parsing programs to easily run filters on the logs, which they can then use to determine if they should read the log as a span or as an event. Fields below that are specific to each, and are fairly self-explanatory.

Since I couldn’t have a cool trace tree with JSON formatting, each event needed to display which spans it belonged to. This could be determined at processing time.

July 21

Wednesday TODO list:

  • Code review!
  • Reimplement serde_json.

William was back at it again with a massive PR fresh from the oven, so Monday started with code review. One of the patterns that I noticed while reviewing was that a lot of the time when an Result was returned, we did:

res.map_err(|e| {
    error!("An error: {}", e);
    e
})

Since this pattern showed up a lot, I thought it would be a good idea to have the logging macros evaluate to their first argument after the string literal, or () if none is given. This would let us do:

res.map_err(|e| error!("An error: {}", e))

I wasn’t entirely sure how readable this was, but it seemed worth thinking about.

I kind of spent a lot of time today working on a personal project, so I didn’t get around to much Kanidm work. I was messing around with heavily abusing Rust’s insanely powerful type system to build computational graphs at compile time which compile down to essentially nothing. Content for a future blog post?

July 22

Thursday TODO list:

  • Actually reimplement serde_json.

I had discussed with William some details on tracing-tests on Tuesday night, and he strongly encouraged me to use serde_json for the JSON formatting. The first thing I had to do was finalize the fields that JSON logs should contain. I figured it should be about the same as the pretty logs, except events should hold the names of all the spans they’re nested in, and spans should hold their direct load and total load as an integer number of nanoseconds instead of a floating point percentage.

The most straightforward approach to this was to alter the fields in MyProcessedSpan to hold the integer nanoseconds instead of percentages, and just compute those during pretty formatting. Easy!

Implementing with serde_json was an absolute breeze. Since I’d done my initial implementation using it, I was able to add it in again very easily without any hiccups.

July 23

Friday TODO list:

  • Finish refactoring.
  • Start integrating with kanidmd.

The first thing I did was start cleaning up my repo for importing. One thing I wanted to do was abstract away the different event tags like RequestInfo, PerfTrace, SecurityAcces, and so on because they were subject to change. I figured a trait would be perfect for this, except I had no good way to pass them through the spans key-value boundry. I couldn’t use u64, because the inner u64 in std::any::TypeId is private, and they’re not promised to stay consisten and I had no way to cast them back. I then realized that instead of having a designated types for each event tag, I could just use &'static str’s. This seemed like a great idea in theory, except the only way to get it through the key-value barrier was as a &str without the 'static lifetime, meaning I would have to allocate a new String for every single log. Haha, no thank you! I though about having each event have a small buffer directly on the stack, but that also seemed sus. Either way, Kanidm uses an enum, and therefore it’s good enough for me.

Some more brainstorming yielded a great idea: I could make a trait that is intended to be implemented for an enum of all the tags a specific user might want to use, and that would allow my subscriber to be modular with different sets of event tags.

It took me about 30 minutes to implement because it started with one generic type… then two, three, and soon literally every type had a generic parameter. I was chasing around compiler errors for awhile, but I finally got it all working. This meant that all we have to do on Kanidm’s end is implement the EventTagSet trait on an enum, and we can just plug it into the subscriber and everything should work! Of course, we won’t have the available macros if the provided set isn’t used, but there’s not much I can do about that.

I spent a bunch of time cleaning up things, like making the process of receiving a bunch of logs through a channel much slicker. Now, the thread can just receive the package via a channel, and call one function to do all the processing and formatting to get a Vec<u8> that can be written to a file or stderr or whatever they choose. I also did some refactoring, decided to use a tree-related naming scheme because the, well… they’re logs! Haha. And also because the logs are organized in a tree format. I thought it was punny.

Now that I was mostly done, I decided to fix my alarm! macro. I realized my initial implementation was at complete the wrong level, and that it should be done within the subscriber.

Here’s what I wanted it to look like:

2021-07-20T18:59:30.605201+00:00 🚨 [ALARM] 🔹 server::internal_search 🔹 server::search: You should be shitting your pants right now! | status="crash"

It took me awhile to figure out at what level the log should be processed, but I ended up putting it after it was converted to the intermediate representation and before it’s added to the span tree.

I was going to start integrating it with memberof.rs today, but I got really intimidated because it calls so many macros and my editor can’t take me to the definitions of things contained in macros. This sucked, since my main learning tool is zooming around src code and seeing how things work, which my editor makes trivially easy. I decided to ask William for guidance on this issue and call it a day there.

Weekly Reflections

This week was leaps and bounds like the last two weeks. I’m happy I was able to have a nice mix of code review, implementing feedback, and being able to add little quality-of-life changes without needing instruction. I’m hoping that I’m able to continue this productive trajectory for the rest of the summer! Also, this blog post is super late because I spent most of the weekend working on my personal Rust project. I will definitely be writing about it some time in a future blog post :)

Written on July 26, 2021