The Experiment Loop

720x340-20150219_163939

Not all technology projects fit neatly into the standard “Lean, Agile, Scrum” way of iterating through work. The assumption made in all three is that you have an end user and that person will interact with the software once development is done. So, in order to flush out details of a user interface, product and the team need to work together to produce independent, testable, negotiable user stories.

But, what happens when there is no user interface? What happens when the primary goal of a team is to keep systems from falling over every time there is heavy load? What happens when changes happen faster than you can write a user story? There is valuable and necessary work here… after all, what good is software when systems are down.

Meet the Experiment Loop… created to help a team of individuals rapidly solve critical business failures without the overhead of Agile practices. In this case, the team is still iterating… but only through short time-boxed feedback loops.

med-20150219_163939

What is the problem we are trying to solve?

Company A struggled during high load events where errors/alerts/failures/changes came so fast that developers were completely overwhelmed and trapped in a cycle of bandaging problems instead of finding solutions for root issues. In order to address this, a team of multi-skilled people were quickly mobilized to identify and fix issues… with an end goal of building out long term solutions.

Let’s call this team the ACTION Team: formed to fix system stability, knowing that once they achieved their goal they would disband.

Identify and prioritize the problem

The ACTION team started with the question, what is the problem we are trying to solve? It helped them focus on identifying issues so they had a place to start for solutioning. We set up two daily stand-ups (morning, afternoon) and an initial problem identification session. In the initial session, developers quickly identified problems, wrote these on sticky notes, and added them to a “bucket/backlog” on the wall. In the future, every time a problem was identified, the team would add it to the bucket of work.

In the first stand-up, ACTION quickly identified the top problem: e.g. system x fell over in production.

Form a Hypothesis

Next, the team came up with a hypothesis as to why this was happening: e.g. we think the system is falling over because our bus is overloaded and we need to add caching.

The team doesn’t know for sure if this is the reason why the system is falling over; it could be one of many reasons but it is a good place to start. It is an assumption that they’ve made and they now need to validate their assumption.

Prove or disprove the hypotheses

The team’s solution was to add caching to the test environment and test to see if this solved the problem. With this task in hand, two members of the team agreed to pair to prove the hypothesis. This was time-boxed to the time between the two stand-ups, when the entire would meet and discuss the results:

  • if the solution proved their hypothesis, the pair would implement in Production
  • if the solution disproved their hypothesis, the team would come up with a different hypothesis and attempt to prove/disprove
  • if the pair needed more time, the team would discuss the value of continuing the work

Re-evaluate

An important part of this cycle was to continually re-evaluate problems, priorities, and how the team worked together. If something did not work, the team changed it. If a different problem became a priority then this was brought to the top of the list; and, change happened rapidly… sometimes within hours.

The Experiment Loop itself did not change. This became the order that team members clung to in the sea of chaos. And, not only did it help the team refine and focus on their priorities, it became a symbol of their rapid successes. We created a kanban wall with swim lanes that matched each of the stops in the cycle: Next, Prove, Release, and Done.

The findings… as the work unfolded

The team used this model exclusively for 1-month and as they incrementally improved the stability of the system, the work began to slow down. This allowed them to move to having a single daily stand-up and be more selective and deliberate in their prioritization.

At the 2-month point, we paused to re-evaluate the high level goals, define success, and come up with key functionality that defined this success. It was only at this point that the team fully understood what success looked like and they were able to call out things that would help them achieve this success. These entered the backlog. The team began to share their learnings with other teams.

3-months into the project, there was a mix of long pulls and short fixes in the queue. The team evolved their kanban wall to include a product piece and a testing piece for long pull items (e.g. a monitoring dashboard). By this point, the Experiment Loop was deeply entrenched in the team mindset. ACTION also had a rough understanding of the size of the scope of their work and were able to determine when they’d be done.

By 6-months, the team had achieved all their goals, the systems were solid, their definition of success was achieved, and the team disbanded. They had a final retrospective and shared their learnings with the rest of the organization. The success learnings are not rocket science and could easily be applied to any project at any point in time:

  • Start… even if you don’t understand the work. By starting you get a better understanding of the work and can begin to shape a plan.
  • Adapt… as you get a better understanding of the work you will need to adapt and further hone in on what provides value.
  • Constantly question value… many things that were considered valuable in the beginning were red herrings that got in the way of real value. It was only through constant evaluation of value were the team able to get to the heart of the problem.
  • Define success early… this helps you determine when you are done. The team could have easily been stuck in a rut of bandaging problems, but by defining success early, they were able to drive towards their goal without getting stuck.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *