My name is Chris Lee. I’m an Infrastructure Automation Engineer on the IT Operations team at LeanKit. A while ago, I wrote about how Ops our team uses LeanKit to visualize our work. Today I’ll share the most meaningful IT Ops Lean metrics from our team and how we use LeanKit to improve the way we work.
The conversation of metrics is not one that brings much excitement into the hearts of many IT Operations Engineers, but this isn’t because metrics aren’t important. In fact, it’s quite the opposite. The problem, as we see it, is what metrics are being looked at, and how those metrics are being used.
The type of metrics that we brag about in Ops are measurable things, such as server uptime, bandwidth utilization or network stats. When these metrics aren’t up to par, we can take a technical approach to the problem and find out what we need to do to patch up the “leak” in our boat.
However, focusing our efforts on improving these metrics might not provide us with the insights we need to find sustainable, long-term solutions to prevent any leaks from occurring. Here are three less glamorous — but far more insightful — metrics to help us identify and solve our real problems, so we can build a better, stronger boat.
1. Unplanned Work
The nature of IT Operations work is tumultuous at best. Even at those times when all the fires are contained, the pitiless glow of the monitoring dashboards loom over us, with every little blip a possible cause for alarm. While we would love to be able to work on our planned project work for the product development roadmap 100% of the time, unplanned work is sure to pop up at the most inconvenient times.
We want to draw attention to any unplanned work that we do, so we created a Card Type on our LeanKit board named “Unplanned Work” and made it a noticeable bright green color. We don’t track this workflow metric as much as we simply try to pay attention to when our board starts becoming more bright green. At the end of every week, we have a retrospective, in which we look at our “Done” lane, discuss the unplanned work cards that were completed, and then decide how much effort should be put into creating planned projects to fix any of them.
2. Workload Distribution
In a perfect world, a team’s work would be distributed evenly among team members. Unfortunately, this is not always the case. Our team is a mix of specialties and knowledge, and our goal is to share the knowledge so it doesn’t exist in a silo. We don’t want one person to be stuck doing the lion’s share of the work.
We use the Distribution reports to find out if any team member or subteam is being overburdened. In a recent example, we discovered that one of our smaller subteams was being assigned to 20% of all of our cards, even though they make up just 10% of our team. The Distribution report in LeanKit helps us identify these kinds of issues so we can prevent frustration and burnout on our team.
3. Speed (Cycle/Lead Time and Process Control)
In the end, it all comes down to cycle time: How long does it take a card to move from “Doing” to “Done”? We typically break our tasks up into small units of work that can be completed in a day or two. If the work requires more than a few days to complete, we break it up into smaller tasks and create a card for each task.
We keep an eye on our cycle time by using the Speed reports, which give us an overall picture of our workflow. For our team, “Unplanned Work” cards typically take less than a day, with other cards such as “Defects” and task cards typically taking two or three days, depending on the complexity of the work.
Measuring how fast we complete our cards helps us better understand our workflow and make plans for future work.
How Lean Metrics Help IT Ops
Using LeanKit helps our team take a holistic and data-driven approach to improving our processes, so we can actually stop the leaks and focus on building a better boat.
Recently, we had a flood of unplanned work come in that forced us to bring our planned work to a halt. The work involved everybody on the team, but it seemed to focus on one of our subteams in particular: Database Administration. By looking at Distribution reports, we saw that they were being assigned to more than 20% of the cards for our entire team, even though they only make up a tenth of our team (mentioned above). When we filtered by the “Unplanned Work” card type, we saw that their percentage was closer to 30%.
We were aware that a majority of the cards were database-related, but since we were all swarming on the problem we couldn’t tell definitively just how badly our DBAs were getting work piled on them.
We saw on our Speed report that our cycle time on planned work was slowing down drastically. Looking at that information in the context of our Distribution report, we were able to identify the underlying problem: We needed more people to handle the amount of work coming in.
This story has a happy ending: In the end, we used this data to demonstrate the need to expand our DBA subteam. Without having supporting data, we wouldn’t have known definitively where the root of the problem was, or that we needed more hands on deck.
Building a Better Boat
It’s easy to measure the things that make us look good (like server uptime or bandwidth utilization) but measuring those things won’t help us improve, or prevent fires from popping up. Sometimes, the solution to our team’s biggest frustrations is right in front of our eyes — we just have to know what to measure.
LeanKit gives us actionable insights that enable us to find permanent, sustainable solutions to our most pressing problems. We’re always trying new ways to improve our workflow because, let’s be honest, who doesn’t want to make their job easier? Tracking these metrics allows us to spend less time bailing out water — and more time building a better boat. That’s something we can all get on board with!