My name is Chris Lee, and I am an IT Operations Engineer at LeanKit. I spend my time here automating deployments of infrastructure and researching new cloud-based technology.
In IT Operations, things change — a lot. What was your priority last week is now on the backburner because of this new, shiny priority that needs to be done yesterday. This typically creates an environment of “continuous confusion”, in which we are constantly shifting our priorities to meet the most urgent needs of the business.
To function successfully, we have to learn to adapt and pivot on a dime. LeanKit makes these pivots easier, turning this “continuous confusion” into “continuous improvement” by synchronizing work across the team, reducing work into smaller batch sizes and highlighting bottlenecks.
In this post, I’ll explain how LeanKit IT Ops uses LeanKit — giving a background on how we structure our team, break our work down, and use LeanKit to visualize our work and continuously improve our process.
Like many other IT Operations teams, our work is a combination of planned and unplanned work.
Planned work is project work that is described in an A3, a minimally marketable feature. That work is represented on LeanKit’s Roadmap Board, which Product Managers and Executives add projects to during our planning sessions. The project is then broken down into a DIV card, a unit of “Deployed, Iterative, Valuable” work, which is then linked to multiple Task cards on the IT Operations board using LeanKit’s Multi-Team Work Distribution feature. By design, a DIV should be a few days’ worth of work, and is broken down into individual task cards. The task cards are the individual pieces of work that we work on.
For example, an A3 can be spinning up a new developer environment in a new datacenter, which requires a “full stack” deployment including web and service servers, databases, virtual network hardware and more. In this case, our customer is our development team, and the value is that they have a new environment where they can do their work.
The DIVs in this case are chunks of work such as a new Web server, a Database server, Firewalls and Load Balancers, etc. Each of these DIVs have multiple tasks involved, such as setting configs, creating new scripts or updating existing ones, creating new databases, etc. These tasks are the individual cards that are on the IT Operations board, which can have their own sub-tasks as needed.
Unplanned work is any work that is not specified in an A3 as project work, and comes from a wide variety of sources. Fires are unexpected network outages or server downtime which, depending on severity, can be a “stop the line” issue. Private Cloud deployments and squad requests such as a new dev environment, new production web server, or database optimization are other common sources of unplanned work. Visualizing all of our work on our single board helps us manage unplanned work by setting realistic expectations and highlighting any bottlenecks.
One of the greatest challenges is that with unplanned work, you have to maintain a certain amount of overhead time because you never know what to expect. This overhead needs to be constantly monitored, as it takes time away from planned work, so any excess time is wasted.
We use LeanKit board metrics to analyze how we spend our time in different types of work, which helps us adjust future priorities accordingly. Generally, we tend to not have enough overhead time allotted for unplanned work because of the amount of planned work we have.
Since I joined the team, our IT Operations team has more than tripled, and we have broken the team down into two distinct functions. Platform Operations covers building, automating and maintaining LeanKit’s cloud-based infrastructure, while Application Operations builds the tools to automate the deployments of our applications onto that infrastructure. We also have a Database Administrator and a Security Expert who are a part of the Platform Operations team but operate independently, as their work is involved in every aspect of the application.
On top of all of this, each feature squad (web, mobile, business enablement, analytics) has an embedded Operations person who maintains their squad’s deployments and acts as a liaison between their feature squad and IT Operations for any squad work requests, and our Product Manager helps us stay organized and sane.
With our current team structure, we decided to keep all of our work on one board, rather than splitting it up onto multiple boards. The reason for this is that much of our work overlaps, and members of Platform Operations, Application Automation Operations, Database or Security Operations work together on various tasks.
To stay on top of shifting priorities, the only lane that we pull work from in our backlog is our “Prioritized” lane. This means that any time we pull work from this lane, we know that it aligns with our current priorities and should not be interrupted by anything that is not “stop the line”. We discuss our backlog during every stand-up, making sure that the highest priority items are making it into the “Prioritized” lane.
Simplifying Our Board Layout
Some of our work, like a new Puppet module or DSC configuration, may follow a development workflow of branching the repository, validating and creating a pull request. But this is very different from the steps required to deliver on a security fix, a new private cloud deployment or a database migration. Since we need to be able to reflect so many different types of work on a single board, we use a very simple template.
At its core, work that is in progress can be broken down into “Doing” and “Validation”. We built queue lanes into our workflow for two reasons: to encourage a pull system, and to be able to readily identify bottlenecks. Our Lean/Agile coach Tommy Norman wrote a great post explaining this concept here.
Using Queue Lanes to Encourage a Pull System
For our Validation step, we have two lanes: the queue lane (Validation Ready), and the doing lane (“Validation Doing”). This means that when our work on a card is done, we move it into the “Validation Ready” lane. Whenever the person responsible for its validation is ready to start validating it, they pull it into the “Validation Doing” lane. When they are done, they move it into “Finished”.
We use a “Watching” lane to visualize any work that has been delegated to another team, but that we want to keep our eyes on, like a security fix. This “Watching” lane does not affect our WIP limit, as we are not the ones doing the work on it.
Visualizing “Stop the Line” Work
Any work that is marked “Stop the Line”, such as a server or load balancer outage, is placed into the Expedite lane. We assign any users to this card that need to stop their work to complete this expedited task. To reinforce this, we block any other active cards assigned to them until the task is completed. When you block a card in LeanKit, you are asked to provide a reason for the blocker, so your team understands the purpose of the delay. We use the title of the “Stop the Line” card as the explanation for the blocker.
To organize the cards, we use a variety of LeanKit features, such as card types, custom icons, user assignments, tags, and dates. Every person that touches a card is assigned to that card, including when the card is validated. We try to give estimates of when cards will be completed (assigning a planned finish date), particularly for unplanned work such as squad requests or private cloud deployments. We use different custom icons to represent different types of work, such as security or database work. Finally, we utilize tags as often as possible, which makes it easier to use the filter and quickly find what we need.
There are two quotes that come to mind when I think about how we manage IT Operations work: Laurence J. Peters says, “If you don’t know where you are going, you will probably end up somewhere else.” However, Douglas Adams says, “I may not have gone where I intended to go, but I think I have ended up where I needed to be.” In any line of work, it is important to always have clear priorities and a plan. But it is equally important to be able to pivot, adjust, and shift priorities as needed, based on the needs of the business.
The only constant in IT Operations is chaos, which creates an environment of “continuous confusion”. Using LeanKit helps our team minimize this chaos, giving us a systematic way to visualize and prioritize our work, and shift priorities as needed. Instead of haphazardly reacting to shifting priorities and unplanned work, we are able to respond, learn, and adapt our system. This helps us shift from a state of “continuous confusion”, to one of continuous improvement.