Kanidm[11]

August 17

Monday to-do list:

Do code reviews.
Rebase PR and prepare for merging.
Open new branch to continue integration.
Open new branch to do refactoring changes suggested.

Monday started with some repetitive code reviews- there were a lot of changes that didn’t seem too tied together, so I felt like I had to do a lot of context switching. I understood what William meant when he said that unrelated changes shouldn’t be in a PR because they just cluttered everything…

Afterwards, I moved onto rebasing my PR onto master, and resolving the conflict went quickly; I just accepted all the incoming changes, and everything still compiled. Great! However, GitHub was giving me red flags because there were test fails, which was something that I’d totally forgotten to check.

After logging stdout to a file for the tests, I found that there were panics when calling tracing_tree::operation_id, meaning the global subscriber wasn’t set for those tests. Since the full integration would involve making all the tests set the global subscriber and this was an initial implementation, I ended up commenting it out to make those tests pass.

I wasn’t out of the deep yet, though: I still had another failure case where tests were trying to set the global subscriber multiple times. I considered making the error recoverable, but decided that the error should never happen in fully finished code, and adding error handling would just allow for sloppy code down the road.

I spent half an hour brainstorming solutions, but couldn’t come up with anything I was happy about so I ended up explaining my thoughts in a comment on the PR, which is pasted below.

This isn’t ready for merging yet, as there are a bunch of unit tests I’m still struggling to make pass.

The issue is that global subscriber lasts between tests, so I can’t just reassign it between tests (which I feel like I should be able to). My idea to resolve this was to just ignore errors from trying to reassign it, but this means that once the first test finishes, it will drop its processing thread, the only receiver of the global subscriber, therefore disabling the global subscriber from sending any more logs without panicking.

Ideas I’ve considered:

Find a way for the current thread to borrow and use the receiver of the global subscriber’s sender. Unfortunately, this would require an Arc<Mutex<UnboundReceiver<_>>> which I’d really like to avoid.

Give the global subscriber a new sender for when have a fresh thread. This doesn’t work because a) multiple threads? b) I only have immutable access to the global subscriber once set by using downcast_ref.

Have a global formatter/processor to avoid the sender/receiver ever being dropped. I don’t think you can have a 'static thread though, especially on a tokio runtime.

Manually flush the threads like AuditScope. This also doesn’t work because I can’t mutably access the global subscriber.

I’ll likely go for the first option because the mutex will probably be used like once in practice, but will make the unit tests pass. Very annoying :(

My main concern is whether a solution is necessary for this problem outside of a testing environment. Won’t we only have to set a global subscriber once in practice?

August 18

Tuesday to-do list:

GSoC final evaluation.
Review comments on GitHub.
Make tracing work for test cases.
Continue tracing integration.

I intended to start of the day by doing the final evaluation, but it required compiling all my work into a blog post which I hadn’t prepared at the time.

Moving onto GitHub comments, William implemented my changes, but not much else had changed. In the span of 15 minutes, I had already checked off half my to-do list!

I switched to researching solutions for the tests issue I was facing, and came across test-env-log, a crate for automatically initializing a tracing subscriber in tests. I thought this was going to be great! However, looking into the source code revealed that it just set a FmtSubscriber to the global default within the proc macro, yielding the library unusable for Kanidm. I also found that it just ignores the error case, meaning it was no better than anything I could write by hand in five minutes.

I came up with an amazing idea while I was brainstorming. The main issue I aimed to resolve was to keep the same processing thread throughout all the tests. Seemed… not feasible without hacky magic. My amazing idea? Give up. It wasn’t gonna happen, and I needed a different approach. The easiest solution was to completely throw out the processing thread, and just let the current working thread do important stuff and also process logs. This was a good compromise because in tests, performance only matter in sections where we’re tracing the execution of an event, and there’s nothing important happening at the top level where the logs get processed.

So now I had two scenarios: in tests, I wanted to process logs on the current thread, and in release, I wanted to process logs on a processing thread. I knew the functionality wouldn’t need to be interchangable at runtime, so I wanted two different Subscriber’s: one for tests, one for release.

Why write two types that are almost identical except one part? It was time to abstract with generics! Everything up until the processing step would be the same for tests and release, so I decided to make a Processor trait to take care of the final step. It would be implemented by two types: TestProcessor and ExportProcessor. The TestProcessor would take unprocessed logs, and just process them in the current thread. The ExportProcessor would instead send the unprocessed logs through an UnboundedSender<_>. Their declarations ended up very clean and self-explanatory:

pub trait Processor: 'static {
    fn process(&self, preprocessed: TreePreProcessed);
}

pub struct ExportProcessor {
    sender: UnboundedSender<TreePreProcessed>,
}

pub struct TestProcessor {}

This meant adding another generic to TreeSubscriber, which I wasn’t happy about. It was at this time that I made the executive decision to hard code my EventTagSet trait into the subscriber, because I didn’t think we would ever care about using other event tags that wouldn’t just extend the current one we were using. So I ended up with just one generic, the Processor, which I was very happy about.

Now that TreeSubscriber was configurable, I made two top-level functions: tracing_tree::test_init and tracing_tree::main_init:

tracing_tree::test_init attempts to set the global subscriber to one with a TestProcessor, and ignores failure. This meant it could be repeatedly called for each test to ensure that the global subscriber is initialized.

tracing_tree::main_init sets the global subscriber to use an ExportProcessor, and panics if it’s already set. Additionally, it requires a tokio runtime to be active so it can automatically get the processor running in a new thread. Since it’s only intended to be called under #[tokio::main] functions, this is essentially handled under the hood already.

These two functions made it super easy for me to get all the test cases passing, since like how env_logger is setup, I just had to drop in a one-liner at the beginning of each test:

let _ = tracing_tree::test_init();

During the process of committing my changes, I recalled how I commented out a call to tracing_tree::operation_id on Monday, but now I was pretty confident it should work because all the tests now had an initialized global subscriber active. Uncommenting this resulted in a panic when unwrapping the result of operation_id, which I figured was caused by a test somehow calling it without wrapping it in a span first. Since this was caused by the lack of full integration, I just removed the unwrap part and left an explanatory comment next to it.

Overall, I was super satisfied with my solution! Thanks to generics, I was able to find a zero-cost solution that allowed me to change very little about how my code worked, while also elegantly finding a solution to resolve all the unit test failures.

As I was looking over my nearly-finished PR in all its glory, I made a horrifying discovery: JSON logs didn’t track their fields. Which was very important considering almost all logs had fields, and JSON was a primary motivation for redoing logging. This only took me about ten minutes to resolve, because I had to take some extra steps to make serde happy about having a mapping nested inside of a struct.

I was then able to push and pass all the test cases. Hooray for a successful day!

August 18

Wednesday to-do list:

Finalize touches on PR and rebase.
Continue integration on a new branch.
Enjoy the rest of my day because I worked 9 hours Tuesday.

Wednesday went off on a rocky start, because I completely messed up the rebase process. I accidentally messed up conflict resolution, tried to abort (without looking up the git commands beforehand), and suddenly everything was “merged” even though it definitely wasn’t. I’m not terrible at git, but it was completely going against my intuition (saying “Already merged” when it wasn’t), so I sent an SOS to William.

The weird thing was that it still compiled, the tests still passed, and now it said there were no conflicts and it was ready to merge. I was dubious of these claims. Since my work was going to branch off of this kinda essential step, I decided to pursue other non-GSoC things the rest of the day to balance out the work I put in on Tuesday. I figured if William was able to merge it (like GitHub claimed possible), I could continue the next day.

August 19

Thursday to-do list:

Enjoy some relaxing refactoring.
Maybe do more tracing integration.

Since my PR merging issues still weren’t resolved, I decided to enjoy my second-to-last day doing some good ‘ol cleanup in a new branch.

One of the patterns that bothered me most throughout the code base was the use of Iterator::fold, particularly because they could almost always be replaced by a more idiomatic all, any, or find_map. So I spent a good chunk of the morning in a fresh branch changing these occurrances, and making sure all the tests still passed throughout the process. Although not particularly impactful, it was incredibly satisfying!

Even after a bunch of that, there was still a pattern that really bothered me: “on error, log and then return the same error”. The pattern looks like this:

some_fn()
	.map_err(|e| {
        error!(?e, "an error occurred");
        e
    })

That’s like, 3 whole lines of code wasted! What if we could just infer that the same error is returned and only call the macro to both log and pass down the error?

This reminded me of a blog post that I read about on Tuesday, which allows you to to combine functionality like the Default trait with macros to make their return types context-dependent. Amazing!

I decided to explore this in a new Rust project on my a local computer.

After a few hours of research, I discovered that this wasn’t possible. I won’t bother explaining the technique (that’s in the blog post), but I will explain why it didn’t work.

The issue was that type-infering traits require their context to be expecting a concrete type, similar to how Default::default can only be placed where the expected type is known. However, Iterator::map expects an object that is FnOnce(T) -> T, which isn’t a concrete type. Since many types (aka closures) can implement this, having a type-infering trait (wrapped in a macro) wouldn’t work, because it wouldn’t know what type to infer to.

Furthermore, like Default::default, the trait needed a method that returned Self, except the type of a closure can’t be known before compilation time. In fact, the only way to return an (unboxed) closure is with impl Trait (Trait being Fn, FnMut, or FnOnce). This made it impossible to have the return type be Self. I also couldn’t put impl Trait as the return type for a trait method, as this wasn’t a supported feature. I had the idea to not use Rust closures, but make my own “callable” type, but implementing FnOnce by hand required experimental featues which I wasn’t willing to dabble in. I also considered Boxing the closures as trait objects, but that was more hassle than I figured the whole idea was worth.

Back to work though. I found another bug when running tests. It turns out that I changed the label of the field that contains the event tag number from event_tag to event_tag_id, and I’d forgotten to reflect that in the subscriber so the event tag wasn’t being parsed properly. This was causing our custom event tags (like filter.error) to not be properly displayed, and instead showed up as a field like event_tag: 10. Finally, I found one last copy-paste error in adding a log, which I resolved. This probably will help avoid a very confused admin in the future :)

August 20

Friday to-do list:

Fix weird commits on idiomatic-refactoring PR.
Start writing summary blog post.

Good news: William provided instructions on how to fix my screwed up rebase.

Bad news: it didn’t work, and I had to delete my project directory and clone it again. Big sad.

So, my last day started with very little ability to write productive code. I ended up spending the day compiling all my contributions and work into a work product blog post so I could officially pass GSoC (woohoo), but didn’t end up writing any new code or progressing on the tracing integration.

Project Reflections

I’m thrilled with how this project worked out, and I want to take time here to reflect on my initial goals coming into the project.

Async work schedule

I loved working asynchronously. I loved the flexibility of working whenever, since it allowed me to organize my work around when I felt most productive. I was spot on with my prediction that async would help me be a more driven problem solver, because problems like I had on Tuesday were super common. I think this extra space really drove me to expand my growth mindset. I really embraced the idea that even if I’m still new, I’m perfectly capable of solving real world problems without needing to ask for help all the time. This is one aspect that I’m extremely proud of.

Expand Rust knowledge

I came into the project thinking I was good at programming. And I was- in an isolated box. Contributing to Kanidm gave me so much experience with interfacing with large libraries. Not only did I pick up Rust techniques from serde, tracing, and `tide, but I also became vastly better at understanding design decisions in complex, flexible libraries. I frequently hit roadblocks where I thought the design choices were poor, only to have an epipheny the next day and recognize the genius behind it.

Becoming a better open source contributor

One of my regrets is never fully attempting to understand exactly how Kanidm works / what it does. Although I asked a lot of clarifying questions through direct messsages and code reviews, I was never really able to put everything together.

Speaking of code reviews, I think this is an area where I really shined. I really enjoy just looking at Rust code, and this obsession allowed me to perform really thorough reviews by finding pieces of logic to clean up, and suggesting functions to simplify procedures. One of the issues I still face is that sometimes I see things unrelated to the PR that I think should be changed, and this clutters the review. Not the worst thing, but a detraction nevertheless.

Finally, I’m incredibly proud of the work that I did with creating a suitable interface for tracing and Kanidm. I know that my contribution will increase the code quality and reduce clutter in nearly every part of the project, and will probably be used for the foreseeable future. I was able to implement tons of feedback, allowing me to feel like my work mattered to the project and not for my own personal satisfaction.

Conclusion

I’m incredibly proud of the work that I’ve done this summer, and grateful for the amazing mentorship I’ve received as well. Google Summer of Code has been such a priviledge, and I know it’s helped me grow immensely as a developer and an open source contributor. Thanks for everything!

Written on August 25, 2021