Kanidm[5]

June 27

Monday TODO list:

Rebase tracing-logging #467 to master.
Respond to feedback on tracing-memberof #497.

The past weekend, William gave me some feedback on my PRs, including suggesting rebasing tracing-logging #467 to master. I wasn’t quite ready to have my changes merged yet, but I’d never done a rebase before and decided that it wouldn’t hurt. I ended up spending about an hour trying to figure it out, but even when I resolved all the conflicts, the files still weren’t correct because some lines weren’t getting updates, causing me to get the red squigglies. I didn’t want to waste more time, and since it wasn’t a top priority, I ended up aborting the resolved conflicts and leaving a comment on the PR.

In my tracing-memberof PR, William suggested that I implement my own Subscriber. However, I ran into a big problem: there were a lot of methods that called for updating the internal state, but accepted self immutably. What the heck? It mentioned passing Visitor objects that can collect information, but that didn’t change the fact the self was immutable. It was also really frustrating because the documentation literally said to “update any internal state”, suggesting that there’s not some other way that I was missing.

June 28

Tuesday TODO list:

Review GitHub comments.
Continue thinking about implementing my own Subscriber.

An idea that I had before I started work was that maybe Subscribers always take themselves immutably because they are designed to be concurrent, and therefore are intended to use some alternate method of synchronizing logs. The easiest way to do this is with an Arc<Mutex<_>>, but I figured there had to be a better way.

I started off by reviewing William’s latest PR, Add statistical analysis to indexes #505. It took me 2.5 hours to review, and I thought it was really cool how we used statistics to create diagnostics that helped us optimize the code. Maybe double majoring with statistics will be useful! I’m still very impressed with the amount of code that he’s able to produce. It seems like he’s pumping out massive PRs every two weeks or so.

After code review, I continued brainstorming for how to implement my own Subscriber. I initially wanted to create a custom Layer objects that wrapped the existing FmtSubscriber implementation. This would provide the option to have the default Subscriber just shoot everything to stderr or a file normally, and have filters choose when and how to send information to the wrapped Subscriber. I wasn’t entirely sure if this was how it worked, but the documentation was very confusing and there wasn’t any big-picture diagrams to explain anything. I ended up pretty frustrated, but I felt like I was making some progress at least.

Ugh. I tried for another hour to figure things out. Everything in tracing_subscriber is so convoluted. Everything is wrapped in 20 layers of traits and wrapper structs. There are nested on nested on nested generics. There’s a Layer struct, a Layer trait, and another Layered struct. Some of them implemented tracing::subscriber::Subscriber, some of them tracing_subscriber::Subscribe. Who the hell designed this? And worse, who decided “Ah yes, lets publicize this and not give any diagrams for how anything works”? It was impossible to tell if I was doing anything right, and worse, they make no effort to describe their thought process. I was having doubts about trying to tackle something this big. I wasn’t even sure if Kanidm would benefit that much from using tracing. All very frustrating.

June 30

Wednesday TODO list:

Review responses to my questions on Williams PR #505.
Find something else to work on to take a break from tracing.

I reviewed William’s responses to the slopeyness magic, and I think I really have an understanding for what it’s used for now. Essentially, outsiders can send Kanidm filters to query for entries in the db. Sometimes these filters have a bunch of parameters, and we find the results of the query by first finding all the entries that match the first parameter, and then looking through those for the ones that match the second parameter, and so on. The issue is that if we start with a very broad parameter, the number of entries stays large initially, making the filter much slower. However, if we sort the parameters (having no impact on the final result because it has to do them all anyways), we can organize them in a way such that the most narrow filters are applied first, therefore cutting down the amount of entries we have to search through for more broad filters.

I decided to say “screw it” and continue working on tracing afterwords, but with a much more loose grip on what I hoped to accomplish. William pointed out that I’m going into unknown territory and therefore shouldn’t feel bad about not making hard progress each day, as long as I was learning and progressing towards deciding whether or not tracing would be a good idea.

With this in mind, I did a little more digging through the docs and it looked like I needed to use a Registry. The source code showed fields containing what looked like threading behavior and a RefCell, which definitely seemed like the right direction for storing span information.

After some more research, I found that I could also store the logs contained in a span inside of the Registry using extensions. It said it can store user-defined data, so I was hoping I could have it store some sort of Vec of logs, but I still had some more research to do on that.

The main point was that all the interior mutability and span storage behavior seemed to already exist, meaning my job was significantly easier. All I had to do was figure out how to piece everything together. My main idea at the time was to have Vec of logs in some form (String? Some other type?) as an extension of each span in the Registry, and then when try_close was called when a span ended, all the logs would get appended to that spans parent, or if it had none, to stderr or some preconfigured file.

This would do allow us to do a few things:

All activity within a span would get logged at the the point in time at which it closed.
Things would work with async, unlike AuditScope.

Something annoying that I kept running into is that the links in the docs are consistently broken. This was particularly annoying when it mentioned something I thought was going to be useful, but then end up having to search for it through the search bar.

I decided to call it a day there because my brain was fried. However, I finally felt like I was making it over the roadblock I’d been stuck on. I just had to understand how to piece all the provided parts together.

July 1

Thursday TODO list:

Respond to PR comments.
Research Registry type and Layer trait more, work on basic implementation.

Thursday started off by reviewing William’s clarifying responses on my comments on #505. Although I was getting a stronger understanding, there were definitely aspects that I needed to revisit later with fresh eyes. I generally had a good understanding of what the slopyness factor does, but I was still shakey on how filtering as a whole works.

Ugh. I was still struggling to figure out how anything pieced together. Ultimately, I needed to end up with a Subscriber. I figured the way to do this was by using a Layered object composed of a type that implemented Layer (which wraps a Subscriber), and Subscriber, although I wasn’t sure why it had two Subscribers.

More docs digging revealed that Registrys are meant to get wrapped in Layers. I still didn’t know what the other wrapped Subscriber was used for. Generating Ids?

I found that a Registrys extensions are implemented in a very cool way. It stores a mapping where the key is an actual type, which I would guess make lookups instant because they’re done at compile time? Not sure, don’t quote me on it.

By the end of the day, I was able to create a basic type containing a Layered struct, which wrapped my own custom Layer wrapping a Registry. Pushed it to GitHub and hoping to get feedback tomorrow!

July 2

Friday TODO list:

Get formatting and timestamps working on KaniSubscriber.
Add custom macros for KaniSubscriber that wrap tracing macros.

Friday began with the usual docs digging. I was working on getting timing to work by having a custom timed field. This was going to get wrapped in a macro:

// `info_timespan` is custom macro defined within Kanidm...
info_timespan!("Performance segment");
// ... which desugars to
tracing::info_span!("Performance segment", timed = tracing::field::Empty);

Essentially, the timed field will be a flag to tell the subscriber whether or not to time the duration of the span. I also decided to have it be a “present or not present” flag and not a boolean flag because a) less memory I guess (?), and b) finding if a field exists is somehow easier than finding out the associated value, and I was tired of reading the docs.

I was also annoyed that fields had to have values (or tracing::field::Empty), since it meant everything was more verbose. I spent a good 15 minutes before I figured this out from the docs.

Another really frustrating issue I came across was that the on_close and on_exit methods were called super ambiguously. on_exit is supposed to be called when the span is exited, which can happen multiple times. on_close is supposed to be called when the span is dropped. However, I found that on_close was never being called (I even unconditionally panicd inside and… no panic), so I put the closing behavior in on_exit. However, this caused the issue of allowing logs to be written before the span is done producing, which I was trying to avoid. Definitely an area for further research.

I dug through the source code for how FmtSubscriber handled this, and it seemed as though the trait interface behaved exactly how I expected it should, so I really didn’t understand why my code wasn’t consistent with that.

On the other hand, I was able to get basic timing working, which I accomplished by loosely copying the source code for FmtSubscriber.

Weekly Reflections

Even though I didn’t produce the most lines of code this week, I certainly believe my Rust abilities improved immensely. I think I was already fairly competent writing Rust code by myself, and was already improving at reading others’ code through code reviews. However, I took a big leap forward this week by combining the two, and forcing myself to integrate my own code with the internals of existing libraries.

I think forcing yourself to see how others think in code is incredibly valuable, especially when you disagree, but also know their code is used and trusted by millions of developers. The whole process reminded me of when I attempted to first learn Rust, and could barely get anything past the compiler. I constantly thought “This can’t be right. Whoever designed this is an idiot!” But slowly, I began to understand and gain an immense appreciation for the genius design choices taken.

I’ve attempted to interface with the internals of serde before and ultimately gave up. My experience reading the docs for tracing felt similarly frustraing, so I’m really happy I powered through and am making good progress now. Next week, I hope to get a fully functioning prototype I can start implementing in more areas of the code base!

Written on July 3, 2021