Incident Communication Template
A template for notifying stakeholders during production outages with pre-drafted messages for each incident severity level and audience type.
Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.
Overview
Poor incident communication turns a technical problem into a trust problem. When customers do not know what is happening, they assume the worst. When executives are surprised, they demand explanations instead of offering support. This template provides pre-drafted messages for every audience and severity level, so your team communicates clearly, consistently, and quickly during outages.
When to Use
Use this template when:
- A production outage impacts customers or internal users
- An incident crosses severity thresholds requiring stakeholder notification
- You need to provide status updates during a prolonged incident
- Post-incident, you need to draft the final communication to affected parties
Prerequisites
Before sending communications:
- Confirm the scope of impact (which services, regions, user segments)
- Verify the severity level with the incident commander
- Identify the correct communication channels for each audience
- Review any regulatory or contractual notification requirements
Solution
# Incident Communication: `<Incident Title>`
## Metadata
| Field | Value |
|-------|-------|
| Incident ID | ______ |
| Severity | P1 / P2 / P3 / P4 |
| Start Time (UTC) | ______ |
| Status | Investigating / Identified / Monitoring / Resolved |
| Incident Commander | ______ |
| Communication Lead | ______ |
---
## Message 1: Initial Notification
### For Customers (Status Page / Email)
**Severity: P1 (Critical)**
> We are investigating reports of [service] being unavailable. We will provide an update within 30 minutes or as soon as we have more information.
>
> **Impacted services:** [List services]
> **Started at:** [Time UTC]
> **Next update by:** [Time UTC + 30 min]
**Severity: P2 (High)**
> We are investigating degraded performance on [service]. Some users may experience [specific symptom]. We will provide an update within 60 minutes.
>
> **Impacted services:** [List services]
> **Started at:** [Time UTC]
> **Next update by:** [Time UTC + 60 min]
**Severity: P3/P4 (Medium/Low)**
> We are aware of an issue affecting [service description]. Impact is limited to [scope]. A fix is in progress and we expect resolution within [timeframe].
---
### For Internal Stakeholders (Slack / Email)
**Severity: P1/P2**
> **INCIDENT ALERT** — [Service] — [Severity]
>
> An incident has been declared for [service]. Impact: [brief description]. Incident commander: [name]. Channel: [link].
>
> No action required from your team at this time. Updates will be posted in [channel].
**Severity: P3/P4**
> **Incident Notification** — [Service] — [Severity]
>
> An incident has been opened for [service]. Impact is limited to [scope]. No customer-facing impact expected. Tracking in [channel].
---
### For Executives (Email / Slack DM)
> **Incident Summary** — [Service] — [Severity]
>
> **Impact:** [number] customers / [percentage]% of traffic / [region]
> **Revenue Risk:** [High / Medium / Low / None]
> **Root Cause (preliminary):** [one sentence if known]
> **ETA to Resolution:** [time if known]
> **Actions Taken:** [what has been done so far]
>
> I will send an update within [timeframe].
---
## Message 2: Status Update
### For Customers
> **Update** — [Service] — [Time UTC]
>
> We have [identified the cause / implemented a mitigation / deployed a fix] for the [service] issue. [Brief description of what happened and what was done].
>
> **Status:** Monitoring / In Progress
> **Next update by:** [Time UTC]
---
### For Internal Stakeholders
> **Incident Update** — [INC-xxx] — [Time UTC]
>
> **Status:** [Investigating / Identified / Mitigated / Monitoring]
> **What we know:** [2-3 sentence summary]
> **What we are doing:** [current actions]
> **What we need:** [any help required from other teams]
> **Next update:** [Time UTC]
---
### For Executives
> **Incident Update** — [INC-xxx] — [Time UTC]
>
> **Current Status:** [Investigating / Mitigated / Monitoring]
> **Customer Impact:** [updated numbers if changed]
> **Root Cause:** [updated understanding]
> **ETA to Full Resolution:** [updated estimate]
> **Risk of Recurrence:** [High / Medium / Low]
> **Postmortem Scheduled:** [Date / TBD]
---
## Message 3: Resolution
### For Customers
> **Resolved** — [Service] — [Time UTC]
>
> The issue affecting [service] has been resolved. All systems are operating normally.
>
> **Duration:** [start time] to [end time] ([duration])
> **Impact:** [summary of what users experienced]
> **Root Cause:** [brief, non-technical description]
> **Preventive Actions:** [what we are doing to prevent recurrence]
>
> We apologize for any inconvenience. If you continue to experience issues, please contact [support channel].
---
### For Internal Stakeholders
> **INCIDENT RESOLVED** — [INC-xxx] — [Time UTC]
>
> The incident affecting [service] has been resolved.
>
> **Duration:** [duration]
> **Root Cause:** [technical description]
> **Resolution:** [what fixed it]
> **Postmortem:** [Date / TBD] — [Link when available]
> **Action Items:** [Link to tracking]
---
### For Executives
> **Incident Closed** — [INC-xxx] — [Time UTC]
>
> **Final Status:** Resolved
> **Total Duration:** [duration]
> **Customer Impact:** [final numbers]
> **Revenue Impact:** [if any]
> **Root Cause:** [one paragraph]
> **Preventive Actions:** [list]
> **Postmortem:** [Date] — [Link]
> **Follow-up Required:** [Yes / No — details if yes]
---
## Communication Rules
1. **Be honest about what you know** — do not guess at root causes
2. **Provide ETAs only if confident** — missed ETAs destroy trust faster than no ETA
3. **Update on schedule even if no progress** — silence breeds anxiety
4. **Use the same channel for updates** — do not make stakeholders hunt for information
5. **Match technical depth to audience** — executives need impact, engineers need details
## Communication Frequency by Severity
| Severity | Initial Notification | Updates | Resolution |
|----------|---------------------|---------|------------|
| P1 | Immediate | Every 15-30 min | Within 15 min of resolution |
| P2 | Within 15 min | Every 30-60 min | Within 30 min of resolution |
| P3 | Within 30 min | Every 2-4 hours | Within 1 hour of resolution |
| P4 | Within 1 hour | Daily or on change | Within 1 hour of resolution |
Explanation
The template separates communications by audience (customers need reassurance and timelines, executives need business impact, internal teams need technical coordination) and timing (initial, update, resolution). The key principle is that every message answers three questions: what happened, what we are doing about it, and when we will update next. Without those three elements, communication creates more anxiety than it resolves.
Variants
| Context | Approach | Notes |
|---|---|---|
| Customer-facing SaaS | Status page + email | Automate via status page tool (Statuspage, Instatus) |
| Internal tools only | Slack + email | No external communication needed |
| Security incident | Legal + PR review first | Never communicate security incidents without legal clearance |
| Data breach | Regulatory notification | May require 72-hour notification under GDPR |
| Mobile app outage | In-app banner + social media | Users may not check email during app outage |
Best Practices
- Draft templates during calm periods — create specific versions for your services before an incident happens
- Assign a communication lead separate from the incident commander during P1s
- Review messages for tone — avoid jargon, blame, or over-technical explanations
- Include a human signature — signed messages feel more authentic than generic status updates
- Track communication delays — if it takes 20 minutes to draft an update, your process is too slow
Common Mistakes
- Saying “we are investigating” for hours — provide meaningful updates or admit you are stuck
- Over-promising resolution times — give ranges (“1-2 hours”) instead of exact times
- Using different terminology across channels — “degraded” on status page and “outage” in Slack creates confusion
- Forgetting to notify internal teams — customer communication is visible, but internal teams need coordination too
- Sending resolution before verification — confirming resolution prematurely leads to reopening
Frequently Asked Questions
How do we handle incidents where we do not know the root cause yet?
State what you know, what you have ruled out, and what you are checking next. Example: “We have identified that the issue is isolated to the API layer. Database and cache layers are operating normally. We are investigating configuration changes deployed in the last 24 hours.”
Should we apologize in incident communications?
Yes, but proportionally. A brief “we apologize for the inconvenience” is appropriate for customer-facing outages. Avoid excessive apology language that sounds insincere. Focus on facts and remediation.
What if an incident spans multiple time zones?
Always use UTC for all timestamps. Include local time for the primary affected region if relevant. Ensure the handoff between shifts includes communication status so updates do not stop when teams go offline.