2025

2025 was an exciting, difficult, stressful, and very mixed year for me, and I’m really uncertain how to sum it up.

On the positive side:

  • I had a very successful professional year, including several big projects delivered and a nice promotion
  • More importantly, I also felt like my team really got a handle on our overall workflows and processes, getting into a groove rather than improvising (read: panicking) about everything
  • I’ve learned a ton about some technical areas I had never had the opportunity to dig into before, both in my work life and general interests. (So many good books!)
  • I kicked off a new D&D campaign with a group of friends that has been a great source of enjoyment and fun
  • I got to travel and visit with friends I hadn’t seen in a long time, at multiple points in the year
  • We adopted a puppy!
Maddie likes to jump directly at any camera pointed at her, making photos difficult

On the other hand…

  • It was a very rough year on the family health side, with both my mother and my partner struggling with major challenges, as well as a constellation of minor issues such that I never felt like I could really relax
  • While I felt very successful in my professional world, work was also extremely busy throughout the entire year, and I spent basically the whole year feeling like I didn’t have enough hours in the day.
  • The creeping fascism in the United States made my family substantially and materially less safe, as well as adding an undercurrent of dread to the whole year
  • Puppies are adorable but I haven’t had a decent night’s sleep in a month!

Overall if I had to sum up the year in a word it would be exhausting. I genuinely feel like I’ve been running as fast as I can all year, and there are whole months I can hardly remember except as a blur.

I’d love to say I plan to slow down in 2026, and I do hope to make some efforts in that direction…. but realistically none of the complicated areas in my work or my life are likely to change, and the world as a whole just looks more bleak. So a lot of my plans for next year look more like managing the chaos more effectively, rather than expecting it to go away.

On Friday deploys

This post from Charity Majors on Friday deploys is well worth reading.

In the past I’ve seen her comment on how deployments should be carried out fearlessly regardless of when, and I’ve often felt like saying “yeah, well, …”. Because of course I agree with that as a goal, but many real-world orgs and conditions make it challenging.

This most recent post talks about the situations when those freezes can make sense, even if they’re not ideal. And in particular I like the discussion about what really needs to be frozen is not deploys, but merges:

To a developer, ideally, the act of merging their changes back to main and those changes being deployed to production should feel like one singular atomic action, the faster the better, the less variance the better. You merge, it goes right out. You don’t want it to go out, you better not merge.

The worst of both worlds is when you let devs keep merging diffs, checking items off their todo lists, closing out tasks, for days or weeks. All these changes build up like a snowdrift over a pile of grenades. You aren’t going to find the grenades til you plow into the snowdrift on January 5th, and then you’ll find them with your face. Congrats!

Why generic software design advice is often useless

In You can’t design software you don’t work on, Sean Goedecke discusses why generic advice on the design of software systems is often unhelpful.

When you’re doing real work, concrete factors dominate generic factors. Having a clear understanding of what the code looks like right now is far, far more important than having a good grasp on general design patterns or principles.

This tracks with my experience not just of software systems, but also systems with a hardware component (eg ML training clusters) or a facility component (eg datacenters). The specifics of your system absolutely dominate any general design guidance.

As the manager of a team that publishes reference architectures, I do think that it’s helpful to clearly understand where your specific design differs from generic advice. If you’re going off the beaten path, you should know you’re doing that! And be able to plan for any additional validation involved in doing that.

But relatedly, this is part of why I think that any generic advice should be based on some actually existing system. If you are telling someone they should follow a given principle, you should be able to point to an implementation that does follow that principle.

Or else you’re just speculating into the void. Which admittedly can be fun but is not nearly as valuable as speaking from experience.

Large software systems

In Nobody understands how large software products work, Sean Goedecke makes a number of good points about how difficult it is to really grasp large software systems.

In particular, some features impact every part of the system in unforeseen ways:

Why are these features complicated? Because they affect every single other feature you build. If you add organizations and policy controls, you must build a policy control for every new feature you add. If you localize your product, you must include translations for every new feature. And so on. Eventually you’re in a position where you’re trying to figure out whether a self-hosted enterprise customer in the EU is entitled to access a particular feature, and nobody knows – you have to go and read through the code or do some experimenting to figure it out.

Sean also points out that eventually the code itself has to be the source of truth, and debugging requires deep investigation of the continually-changing system.

I’ve seen this happen in a bunch of different orgs, and it does seem to be true, especially for products with a large number of collaborating teams. I would add that in addition to the code itself, you often need to have conversations with the relevant teams to discern intent and history. Documentation only goes so far, eventually you need talk to people.

The trap of prioritizing impact

(I wrote this originally as a comment in RLS in response to a staff-level engineer who was frustrated at how little they got to code anymore, and it resonated with enough folks that maybe it’s worth sharing here!)

There’s a trap I’ve seen a lot of staff+ folks fall into where they over-prioritize the idea that they should always be doing “the right, most effective thing for the company”. When I see engineers complain that they don’t get to code enough, I often suspect they’ve fallen prey to this.

I say that’s a trap! because I see people do this at the expense of their own job satisfaction and growth, which is bad for both them and (eventually) for the company which is likely to lose them.

I don’t blame people for falling into this trap, it’s what we’re rewarded for. I’ve fallen into it! I have stopped doing technical work I cared about, prioritized #impact, and fought fires wherever they arose. I have spent all my time mentoring and teaching and none coding. The result was often grateful colleagues, but also burnout and leaving jobs I otherwise liked.

Whereas when I’ve allowed myself to be like 30% selfish — picking some of my work because it was fun and technical, even when doing so was not the “most impactful” thing I could do — I was happier, learned more, and stayed in roles longer.

An example: I worked on a team that was doing capacity planning poorly and was buying too much hardware. (On-prem, physical hardware.) I could have solved the problem with a spreadsheet, but that was boring and made my soul hurt.

What I did instead was dig into how our container scheduling platform worked, and wrote a nifty little CLI tool that would look at the team’s configured workloads and spit out a capacity requirement calculation. It took about three times as long as the spreadsheet would have, but it was fun and accomplished the same goal and gave me some experience in the container platform. And it wasn’t that much of a time sink.

Was that better for the company? No idea. I hope it was — I hear the tool is still maintained and no one has replaced it with a spreadsheet yet! But that’s a happy accident.

Was it better for me? Absolutely! It was a bit selfish, but it made an otherwise tedious task more fun and I learned some useful tricks.

So — if you wish you had more time to code… go code a bit more. Don’t let the idea of being more effective guilt you into giving it up. Your career is your career and you should enjoy it.

Consider your (multiple!) audiences

I’ve been spending a lot of time helping colleagues with communications lately, and one of the refrains I keep repeating is “consider your audience!”

For any given project you’ve worked on, you may need to be able to talk about your work in many different formats. Non-exhaustively, these might include:

  • Working documentation for colleagues within your team, familiar with all the context and able to find and integrate other documentation
  • A detailed report for other experts outside your team, showing all your work
  • A short technical report, also for other experts, but without all the steps shown
  • A one-page executive summary, geared towards decision-makers who are in that fuzzy “informed but not fully expert” zone
  • A less-technical essay in more informal language, i.e. the “blog post” format
  • A single paragraph summarizing key results
  • A 30-60 minute presentation
  • A 5-minute presentation
  • A single bullet point that can go in your resume

Each of these formats is pitched at different audiences — people who have different levels of background knowledge, need different information about the project, and who are willing to spend different amounts of time reading your work. Some formats might work for multiple audiences, but a lot of the time you need to use a different format when the audience changes. IMO there is no such thing as a format that works for everyone!

A common trap I see engineers fall into is thinking that the first version, the detailed technical report, is the only document they need. And that all the others can be produced by just copy-pasting from the detailed report. To some extent that can be a starting point, but each of these different communication formats does need to be tailored to the people you expect to read it!

I also see folks forget that some audiences will not have time for a detailed report. If you’re communicating with a less expert audience, they are almost always going to be unwilling or unable to read a 20 page doc… or even a 5 page doc. These audiences need their own format because you need to be able to provide information at the right page.

Over time, and especially as you get more senior, you will likely need to be at least reasonably good at all of these. In my current role, managing a fairly high-level engineering team, I often need to be able to write all of these for any given project.

Because it’s 2025, many folks will suggest “just use an LLM translate between formats!” That’s not necessarily a bad approach, at least when you’re going from a more detailed document to a less detailed one. I’ve seen LLMs produce some decent executive summaries. But it’s really important to remember that LLMs are just text generators, not experts — you are the one who actually understands your project!

LLMs will also cheerfully invent information, so you can’t trust them to fill in details starting from a summary. (This seems like it should be obvious, but I have absolutely seen people try to use an LLM to produce a detailed doc from a summary — that direction doesn’t work!)

The HPC cluster as a reflection of values

Yesterday while I was cooking dinner, I happened to re-watch Bryan Cantrill’s talk on “Platform as a Reflection of Values“. (I watch a lot tech talks while cooking or baking — I often have trouble focusing on a video unless I’m doing something with my hands, but if I know a recipe well I can often make it on autopilot.)

If you haven’t watched this talk before, I encourage checking it out. Cantrill gave it in part to talk about why the node.js community and Joyent didn’t work well together, but I thought he had some good insights into how values get built into a technical artifact itself, as well as how the community around those artifacts will prioritize certain values.

While I was watching the talk (and chopping some vegetables), I started thinking about what values are most important in the “HPC cluster platform”.

Continue reading

The Practicing Stoic, by Ward Farnsworth

Over the years I’ve read a number of different books on stoic philosophy, including some of the “modern Stoic influencers” like Ryan Holiday as well as a few translations of older philosophers like Marcus Aurelius. While I’d hardly call myself a follower of the philosophy, I do think it includes some helpful ideas, and it’s occasionally been a useful lens for dealing with some problem I’ve been dealing with.

I struggled with both sets of writing, however, for different reasons. The modern writers often made me roll my eyes, often clearly pitching at entrepreneurs and CEOs, and billing an ancient philosophy as a life hack. The work of the ancients, I found more interesting, but difficult to contextualize and navigate.

Continue reading

first dog walk of 2024