How to Conduct a Blameless Security Post-Mortem
— November 18, 2016
When someone in your company clicks on a bad link, it can spell bad news. But you know what’s worse? Them never telling you.
When employees are afraid to come forward about a mistake they’ve made (or think they’ve made), it makes security responders’ jobs that much more difficult.
Unfortunately, this kind of negative atmosphere is a reality at many companies. The good news is the culture can be improved, and one way of doing this is by conducting blameless security post-mortems. I spoke about this in my DevOpsDays Austin talk in May, 2015. You need your whole team to be security ambassadors (not roadblocks), and blameless security post-mortems can help enable this.
Below, we’ll explore what a blameless post-mortem is and how it applies to security.
(Note: Much of what we cover can also be applied to Agile retrospectives, which are held when an iteration is reviewed and recommendations are put forward for making improvements.)
What is a Blameless Post-Mortem?
A post-mortem is held after an incident has taken place (in this case, a security breach of some type). The security team sits down with the rest of the organization (or the affected team) and talks through what happened, identifies causes, lessons learned, and how to move forward. The key to an effective post-mortem is doing this in a way that does not place blame on your employees. After all, it’s the hackers or bad guys who perpetrated the attack who are ultimately at fault.
As Jason Hand at VictorOps recounts from the David Zwieback book on post-mortems, “Your organization must continually affirm that individuals are NEVER the ‘root cause’ of outages.” The same is true for security breaches.
Not only will this avoid alienating team members, which can lead to them being reticent about bringing forth information in the future, but it can also help ensure that investigations into security incidents actually uncover the root cause (not just who clicked the link). If Dave in Accounting voluntarily reports that he succumbed to a phishing attempt, then you can focus on where the email came from, what it was trying to accomplish, and how to educate everyone in the company so they can avoid this problem in the future.
This encourages communication, collaboration, and cohesion between security and members of other teams across your organization, and also accelerates speed-to-response, which is the best way to optimize your security operations.
Below is our advice for conducting a successful security post-mortem.
How to Conduct a Blameless Security Post-Mortem
A blameless security post-mortem has six key steps:
1. Do Your Homework
Before the post-mortem takes place, make sure you take time to understand exactly what happened and figure out how to explain it to your team in appropriate terms. If there was a phishing attack that succeeded when an employee clicked on a bad link, talk about what phishing attacks look like and signs to look out for (like misspelled domain names). If there was a larger systemic or organizational failure that ultimately led to a specific incident, make sure you have a clear picture of the problem (not just one person’s role in it) before you sit the team down to talk about it.
2. Focus on the What (Not the Who)
When you do sit down, focus on what happened and not on who caused it to happen. In many cases, there’s not just one person involved, although someone may have been the “straw that broke the camel’s back.” Regardless, you want to ensure that neither the people running the post-mortem nor any other employees point the finger at a specific person. The most important thing is not who did it; it’s understanding what happened so everyone can learn from it. Focusing on what, not who, should neutralize emotions and eliminate blame, allowing you to deal with the facts logically.
3. Discuss How to Prevent Problems in the Future
Of course, the most important part of a security post-mortem is making sure the problem doesn’t happen again. In some cases, this may be a matter of increasing employee education and training to make sure that everyone understands what they need to look out for in the future.
In other cases, there is a larger organizational issue — a broken process, misused tool, or misunderstood directives. The post-mortem is a good time to begin the correction process. For example, if a flawed process was ultimately to blame, have an open discussion about how that process needs to be amended and solicit input from everyone on next steps. This way, after the post-mortem, you can form a plan of attack. Keeping everyone focused on prevention (or improvement) will go a long way toward reducing blame.
4. Keep the Door Open
Make it clear that team members can always come and talk to the security team if they aren’t sure whether something is safe, or if they think they’ve already done something that will compromise security. Keeping the door open is the key to making sure they come to you when something happens. On some teams, group chat like Slack or Hipchat can help streamline communications. So consider creating a channel where people are encouraged to post about anything that strikes them as suspicious and have a security team member respond in a timely fashion.
5. Handle Performance Issues Separately
PagerDuty makes an excellent point in their blog post on post-mortems. Sometimes it’s the case that a specific person has made repeated, ongoing mistakes that signal poor performance or other problems that may need to be corrected via human resources (not the security team). But make absolutely sure to keep personnel issues out of the post-mortem. This way you’ll avoid legal issues, possible breaches of your HR policy, and, for our present purposes, muddying up the port-mortem with issues that do not detect root cause, outline corrective action, or point the way to improved results. And by doing so, you’ll also help to foster a culture of trust, constructive communication, and transparency.
6. Always Focus on Lessons Learned
Security incidents happen. That’s the nature of modern, technology-driven businesses.
The bottom line is that you should use security incidents as a learning experience, not as a forum for criticizing an employee who made a mistake or fell victim to an attack. People should never be afraid to approach the security team; they should always see you as the path to fixing a mistake and understanding what went wrong.
To make the most of a post-mortem, focus on what the entire organization can learn from any given incident. Put a process in place for communicating, as appropriate, to broader groups within your company. Educate your employees in order to strengthen their knowledge and improve their habits.
Communication and trust are core. A culture that is approachable and nonjudgmental about security will, we believe, reduce the number of security incidents, improve the response to incidents that do occur, and strengthen your organization’s overall security posture from the ground up.