At PagerDuty, postmortem reports are critical. A comprehensive template ensures you cover all your bases.
It’s important to start with the postmortem owner, defining when the meeting will occur and whether it was recorded. Put together a thorough overview, explaining what happened and any contributing factors to be aware of when reviewing the project. Define how problems were solved, unpacking both short and long-term solutions.
One of the most significant elements of a comprehensive postmortem template is the Impact table. It allows stakeholders a quick view of the problem that occurred, the accounts or users affected and any support requests raised as a result. Responders are defined, and timelines documented to ensure the incident is mapped in its entirety.
Post-Mortem Template
Post-Mortem Owner: Your name goes here.
Meeting Scheduled For: Schedule the meeting on the "Incident Post-Mortem Meetings" shared calendar, for within 5 business days after the incident. Put the date/time here.
Call Recording: Link to the incident call recording.
Include a short sentence or two summarizing the contributing factors, timeline summary, and the impact. E.g. "On the morning of August 99th, we suffered a 1 minute SEV-1 due to a runaway process on our primary database machine. This slowness caused roughly 0.024% of alerts that had begun during this time to be delivered out of SLA."
Include a short description of what happened.
Include a description of any conditions that contributed to the issue. If there were any actions taken that exacerbated the issue, also include them here with the intention of learning from any mistakes made during the resolution process.
Include a description what solved the problem. If there was a temporary fix in place, describe that along with the long-term solution.
Be very specific here, include exact numbers.
Some important times to include: (1) time the contributing factor began, (2) time of the page, (3) time that the status page was updated (i.e. when the incident became public), (4) time of any significant actions, (5) time the SEV-2/1 ended, (6) links to tools/logs that show how the timestamp was arrived at.
Each action item should be in the form of a JIRA ticket, and each ticket should have the same set of two tags: “sev1_YYYYMMDD” (such as sev1_20150911) and simply “sev1”. Include action items such as: (1) any fixes required to prevent the contributing factor in the future, (2) any preparedness tasks that could help mitigate the problem if it came up again, (3) remaining post-mortem steps, such as the internal email, as well as the status-page public post, (4) any improvements to our incident response process.
This is a follow-up for employees. It should be sent out right after the post-mortem meeting is over. It only needs a short paragraph summarizing the incident and a link to this wiki page.
Briefly summarize what happened and where the post-mortem page (this page) can be found.
This is what will be included on the status.pagerduty.com website regarding this incident. What are we telling customers, including an apology? (The apology should be genuine, not rote.)
SummaryWhat Happened?What Are We Doing About This?
At PagerDuty, postmortem reports are critical. A comprehensive template ensures you cover all your bases.
It’s important to start with the postmortem owner, defining when the meeting will occur and whether it was recorded. Put together a thorough overview, explaining what happened and any contributing factors to be aware of when reviewing the project. Define how problems were solved, unpacking both short and long-term solutions.
One of the most significant elements of a comprehensive postmortem template is the Impact table. It allows stakeholders a quick view of the problem that occurred, the accounts or users affected and any support requests raised as a result. Responders are defined, and timelines documented to ensure the incident is mapped in its entirety.