Incident Post-Mortems That Actually Change Behavior

The blameless post-mortem doctrine came out of a real insight: most serious incidents have systemic causes, and if you treat them as individual failures you’ll get defensiveness, cover-ups, and a culture where people are disincentivized to be honest about what they know. That insight is correct. But somewhere in the translation from insight to practice, “blameless” became “consequence-free,” and “systemic” became an excuse to never name anything specifically.

I’ve sat through dozens of post-mortems over the years. The ones that changed how the organization operated all shared a specific property: they named the decision that created the conditions for the incident, they identified who had the authority to make a different decision, and they assessed whether that person had the information they needed to decide differently. That’s not blame. It’s accountability mapping. And most organizations are deeply uncomfortable with it.

The accountability-mapping question is: given what was known at the time, was the decision reasonable? If yes, the fix is informational — the person needed different inputs. If no, the fix is process or culture — something allowed a bad decision to persist without correction. Either way, you have a specific thing to change rather than a general aspiration toward “better communication.”

The action item trap

The most reliable sign that a post-mortem won’t change anything is an action item list that contains the phrase “improve monitoring.” Not “add an alert when X exceeds threshold Y on service Z” — just “improve monitoring.” That level of specificity is the organizational equivalent of a new year’s resolution. It feels like accountability without creating any.

Good post-mortem action items have three properties: they’re specific enough that you could verify in two weeks whether they were done, they’re assigned to a named person rather than a team, and they address a root cause rather than a symptom. The last one is the hardest. Symptom-fixing is satisfying because it’s concrete and fast. Root cause work is slower, often requires coordination across teams, and frequently surfaces decisions that would require a difficult conversation to change.

The second action item failure mode is the action item that disappears. An item gets assigned, the post-mortem closes, and three months later the same class of incident happens again. When you dig back, the action item was technically completed — the monitoring was added, the runbook was updated, the team was trained — but the systemic condition wasn’t addressed. The symptom was patched and the root was left intact.

The fix for this is a post-mortem review practice that most teams skip: six weeks after the incident, someone opens the action items and asks whether the system would respond differently to the same failure mode today. Not whether the action items were done. Whether the organization is actually safer. That’s a different question.

The cultural signal problem

Post-mortems are cultural artifacts as much as they are engineering documents. The way they’re written, who attends them, how the facilitator handles disagreement, whether executives attend and what they say when they do — all of this shapes what people believe will happen if they tell the truth about what went wrong.

The most damaging thing an executive can do in a post-mortem is attend and ask why this specific person made this specific decision in a tone that signals they already know the answer and don’t like it. That behavior, once observed, recalibrates every future post-mortem in the organization toward self-protection. Engineers start writing timelines that are technically accurate and informationally incomplete. Decisions get attributed to “the team” rather than to individuals. The document gets more polished and less honest with every cycle.

The most useful thing an executive can do in a post-mortem is model the accountability conversation they want to see. “I was aware of this risk and I didn’t push for it to be addressed because I thought the probability was low. That was a mistake.” If you’ve never said something like that in a post-mortem, you’re asking your engineers to do something you haven’t demonstrated willingness to do yourself.

The orgs with genuinely functional incident culture aren’t the ones with the most sophisticated post-mortem templates. They’re the ones where the most senior people model that finding the real cause is more important than protecting anyone’s reputation, including their own.

Incident Post-Mortems That Actually Change Behavior

The action item trap

The cultural signal problem

Share this article

Tags:

Keep reading

The On-Call Rotation the CPO Never Sees

Your Best Engineers Should Spend Less Time Coding: The Unpopular Truth About Scaling Technical Leverage

Technical Credibility When You Haven't Coded in Five Years

The things nobody writes on LinkedIn