Engineering management as a distributed system

“My utilization is too high! My latency is increasing, I’ve got too many cache misses, my search results aren’t relevant, I’m way above the threshold and all my alarms are going off, especially in the middle of the night.”

Ever feel this way as an engineering manager – that your compute and storage resources aren’t cutting it, but you haven’t figured out how to turn on auto-scaling? I feel it, and I hear it from so many of my fellow eng managers.

I noticed some distributed systems analogies could be in order to help me think about this…

Front end, user interface: meetings, meetings, meetings
API layer: email, slack, text, DMs, code review, gdoc comments
Database/storage layer: paper notebooks, Evernote, gmail, google docs, spreadsheets, iOS notes
Compute layer: multitasking/multithreading, background processes; certain processes require all the CPU, certain processes cannot be multithreaded (see the book Thinking Fast and Slow by Daniel Kahneman for more on System 1 and System 2 thinking)
Network: packet loss through unread messages, incomplete todo items, missed deadlines, poor memory
Firewall: executive assistant arranging and declining meetings, deflecting unnecessary requests and email
Cache: LRU caching, in that most recently used items are fast to access, older items need to be fetched from storage (see Database/storage layer)
Latency: length of time it takes to respond to email, slack, text, DMs, code review, gdoc comments (and hey we probably need to define some SLAs here)
Availability: Time in the office = 365/24/27 – (PTO + nights/weekends + training); PTO = system maintenance, training = system upgrade.
DDoS attacks: relentless recruiters, sales people
Search: complexity increases with the number of Database/storage layer systems employed
Monitoring, observability: health checkups, heart rate monitoring, blood tests
Power systems: sleep, food, exercise

I’m sure my fellow EMs and engineers can think of lots more parallels. So how might these concepts help me design a more robust management “service?” What do I need to deprecate? Where do I need more resiliency? What do I need to upgrade? Can I do some performance tuning? Hm, maybe what I need is an embedded management SRE!

Photo by Thomas Jensen on Unsplash

Engineering management as a distributed system

Follow me on Twitter

Tags

Top Posts & Pages

Share this:

Related

Follow me on Twitter

Tags

Top Posts & Pages