Peacetime vs. Wartime in DevOps: Lessons from the Fire Department

Guest blog post by Ron Vidal, Rob Schnepp, and Chris Hawley of Blackrock 3 Partners LLC. Blackrock 3 Partners are experts in Incident Management, combining decades of experience in the fire service, law enforcement and anti-terrorism managing large-scale public safety emergencies with decades of experience in managing web operations, critical infrastructure, capital markets and M&A activities for international broadband network operators and high performance computing companies. 

It’s a quiet, sunny day, like so many other quiet, sunny Peacetime days…

Then, somebody sees flames and smoke pouring out of a building… Wartime… 

Call 911… Help is on the way!

fire departmentIn seconds, the 911 Communications Center dispatches fire engines, ambulances, rescue companies and Incident Commanders to the location of the emergency. In about 4 minutes, the first 25 trained firefighters arrive on scene with their specialized vehicles, tools and skills. The Incident Commander establishes command, sizes-up the situation, sets tactical objectives, begins operations, evaluates resources needed and organizes the effective resolution of the emergency.

Does this sound like how your DevOps team responds to high severity incidents?

If you have Operations, then you will also have Emergency Operations. Like an alarm sounding at a fire station, immediately launching firefighters into action, DevOps teams must respond with the same level of urgency to resolve their emergency. In both cases, the clock is ticking.  The problem is unlikely to get better until the right resources are dispatched and respond at the right time, working under a leader making the right decisions, all within an organizational framework.

The Shift from Peacetime to Wartime

Peacetime is the mode of operation that occurs during the normal day-to-day activities of any IT organization. Developers write code. Operations keep the infrastructure running. Business as usual. In other words, a perfect Peacetime day.

Here’s what a typical Peacetime organization chart looks like:

peacetime org chart

Wartime is the mode of operation that occurs when systems are NOT normal. Operations has declared a SEV level event and initiated an Incident conference bridge. On-call subject matter expert engineers are querying alerts and looking at performance data. Customers are out of service. Business is NOT ususal. We have a serious problem and  and it needs to be fixed, right now. In other words, a Wartime Incident.

In the Fire Department, the shift from Peacetime to Wartime occurs when an emergency is reported. Wartime is different. People behave differently. Their language and method of communications is different. Conversations are typically much shorter, more direct and aimed at problem solving on a compressed timeframe. To the uninitiated it sounds abrupt. It sounds choppy. It sounds sterile. And it should.

As 60 Minutes reported in its March 17, 2013 interview with Jack Dorsey, “Young Jack was intrigued by the messages he heard coming out of the St. Louis emergency dispatch center. At home he listened to it all on a police scanner. And he was struck by the fact that everyone talked in short bursts of sound – a system of communication that later inspired him to invent Twitter.”

In Wartime, the Incident Commander is thinking faster than the emergency is unfolding. To do that, communications must be direct, crisp and clean, like “Voice Twitter”.

Understanding Wartime Communication

The Wartime organizational chart will look different than the Peacetime organizational chart. In fact, the CEO (Peacetime leader) is exactly the wrong person to lead the emergency (Wartime) response, because someone still has the run the business and the unaffected parts of the organization. Roles and responsibilities, chain of command, and the assignment of tasks in Wartime will be very different than they are in Peacetime.

Here’s what a Wartime organization looks like:

Wartime org

Here’s a comparison of how a Fire Department and DevOps respond to an emergency:

Fire Department DevOps
1 911 Call Alert Notification
2 Radio Dispatch Notify & Assemble Technical Resources
3 Size-Up Declare Severity Level
4 Tactical Radio Communications Channels Tactical Communications Channels
5 Establish Command Initiate Conference Bridge
6 Set Tactical Objectives Make a Plan
7 Put Out The Fire Fix Systems
8 Dissolve Command Return to Normal Operations

In short, solving Wartime problems requires a Wartime mentality, and a defined process for incident management. Fire Departments have developed a system that has been in use for over 40 years and has managed tens of millions of Wartime incidents. Without a doubt, DevOps teams are emergency response organizations much like Fire Departments.

When an emergency is reported, it is a clear signal to all that the organization has shifted from Peacetime to Wartime. All responders must understand and accept the sense of urgency and accountability that comes with the shift to Wartime and perform their assigned tasks accordingly.

Every emergency (Wartime) response absolutely depends on robust communications for the entire incident, from alert notifications to dispatch to dedicated tactical channels to escalation to resource accountability. Peacetime has a clutter of multiple communications systems that distract Wartime responders. Wartime demands clear communications for every incident and every emergency responder, and centralized incident management system will help cut through the noise. Each element of the emergency (Wartime) response depends on effective, reliable, communications. Without rock solid communications, Incident Management fails.

So, next time you see a fire engine speeding to an emergency Code 3 (lights and siren), just remember that they are in Wartime and using the same tools and systems that you can use to manage your DevOps fires.

For more information on Blackrock 3 Partners LLC, please visit www.blackrock3.com.

FacebookTwitterGoogle+
This entry was posted in Best Practices, Guest Blog, On-Call and tagged , , , , , , , . Bookmark the permalink.
  • Jacob Dawson

    I like this read on the subject, and I think it’s a pretty good start. I do SAR in my free time, and from that, I’d say this idea would benefit from a read of the FEMA NIMS (National Incident Management System) courses, primarily ICS (Incident Command System) 100 and 700, and probably 200 and 800 after that. 100 and 700 are standard for entry level SAR volunteers, and we quickly get into the 200 and 800 material with a little more training (all four are standard for EMT-Basics, too). These courses break down the system that First Responders use to manage incidents from a single-patient ambulance call to a multi-county disaster and get into the details of how it works. While obviously our technical emergencies don’t always develop in the same way, a thorough understanding of the system would provide further insights into how to adapt it to an organization’s response to emergencies. There’s more to handling these incidents than just how you communicate, and the organizational structure used to manage life-and-death emergencies has its value in managing our technical emergencies, as well.

    • Ron Vidal – Blackrock 3

      Hi Jacob,

      Happy New Year!

      Thanks for the comment!

      You certainly know your ICS and we agree that adding components of ICS in future guest blog posts will be helpful to the community. The Blackrock Partners are experienced in the full compliment of ICS classes as well as being long time practitioners of ICS on large scale public safety incidents. Our goal with this first post is to introduce the topic and plant the seed of understanding that a shift in perception and organizational culture in times of emergency is the first step to solving it. The peacetime-wartime shift we propose is a beginning point. It’s a place to stop and shift the approach of the company and its responders to a different tempo, a different org chart and a distinct structure. As you know ICS is flexible and adaptable to any type of incident or operating environment and we’ve adhered to the core ICS principles while customizing them to the tech industry.