How to Build an Incident Response Plan That Actually Works During an Incident

Most incident response plans fail when they are needed. Not because the plan does not exist — but because it was written to satisfy an audit requirement rather than to guide a team through a crisis. The plan reads well in a boardroom. It collapses in a war room.

The failure mode is consistent: the plan says “the Incident Response Team will contain the threat” without defining who the Incident Response Team is, what containment means for each attack type, who has the authority to disconnect a production system, or how to communicate with leadership while simultaneously investigating.

This guide covers how to build an IR plan that works under pressure — written from the perspective of someone who has used these plans during live incident response.

Why Plans Fail: The Three Common Causes

1. The Plan Describes Outcomes, Not Actions

“The IRT will eradicate the threat and restore normal operations” is an outcome statement. It does not tell anyone what to do. Under stress, people need specific actions: “The endpoint analyst isolates the affected device using EDR, collects a forensic image, and notifies the IR lead within 15 minutes of containment.”

Every section of your IR plan should answer: who does what, in what order, using which tools, with what authority, and who do they notify?

2. Roles Are Defined by Title, Not by Capability

“The CISO leads the response” fails when the CISO is on holiday, in a different timezone, or unavailable. Roles must be defined by capability with named primary and secondary assignees. The IR Lead is the person who coordinates the response — that role might be filled by the Security Manager, the IT Director, or a senior analyst depending on who is available.

Define roles by function (IR Lead, Technical Lead, Communications Lead, Legal Liaison, Executive Sponsor) and maintain a contact roster with primary and backup for each role. Test the roster quarterly by calling the backup at 7pm on a Tuesday — if they do not answer, your plan has a single point of failure.

3. Classification Is Binary (Major/Minor)

Binary severity creates two failure modes. If everything is “major,” the team experiences alert fatigue and stops treating incidents as urgent. If nothing reaches the “major” threshold until data is already exfiltrated, the team under-responds to incidents that escalate.

Four severity levels work in practice. The definitions must include concrete indicators — not subjective judgements.

Building the Plan: Section by Section

Section 1: Purpose and Scope

One paragraph. Define what constitutes a security incident for your organization (not just “a breach” — include attempted access, policy violations, data loss, availability events, and vendor compromises). Define what is out of scope (IT service requests, hardware failures without security impact).

State the regulatory notification requirements that apply to your organization. If GDPR applies, the 72-hour supervisory authority notification deadline starts when you become aware of a personal data breach. If you operate in financial services, your regulator may require notification within 24 hours. These deadlines are non-negotiable and must be visible on the first page of the plan.

Section 2: Severity Classification

Four levels. Each level defines: the indicators that trigger classification at that level, the response time requirement, who is notified, and what authority the IR team has.

Severity 1 — Critical Indicators: confirmed data exfiltration, ransomware deployment, active adversary with administrative access, compromise of authentication infrastructure, or any incident affecting regulatory notification thresholds. Response: immediate. Full IRT activation. Executive notification within 30 minutes. External counsel engaged. Forensic preservation begins immediately. Authority: IR Lead can disconnect production systems, revoke all sessions, and disable accounts without prior approval.

Severity 2 — High Indicators: confirmed compromise of a user account with access to sensitive data, malware execution on multiple endpoints, successful phishing with credential harvesting, or compromise of a business-critical application. Response: within 1 hour. Core IRT activated. Management notification within 2 hours. Authority: IR Lead can isolate affected endpoints and disable compromised accounts. Production system changes require executive approval.

Severity 3 — Medium Indicators: single endpoint compromise without lateral movement, phishing campaign with no confirmed credential compromise, policy violation with potential security impact, or anomalous activity requiring investigation. Response: within 4 hours. Assigned analyst investigates. Management notified if escalation is warranted. Authority: analyst can isolate the affected endpoint. No broader infrastructure changes without IR Lead approval.

Severity 4 — Low Indicators: failed attack attempts (blocked phishing, failed login brute force), minor policy violations, vulnerability discoveries without active exploitation, security tool alerts requiring review. Response: next business day. Logged and tracked. Reviewed in weekly security operations meeting. Authority: standard analyst actions. No escalation unless the investigation reveals higher-severity indicators.

Critical rule: Severity can only be escalated, never downgraded during an active incident. If an incident starts as Severity 3 and the investigation reveals lateral movement, it escalates to Severity 2. It does not return to Severity 3 if the lateral movement is contained — the highest severity reached determines the response and reporting requirements.

Section 3: Incident Response Team Roles

Define each role with three elements: responsibilities during an incident, named primary and backup assignees, and contact information (mobile phone, not desk phone or email — during a compromise, email may not be trustworthy).

IR Lead — Coordinates the overall response. Makes containment decisions. Manages the timeline. Ensures evidence is preserved. Communicates status to the Executive Sponsor. Does NOT perform technical investigation — their job is coordination, not analysis.

Technical Lead — Leads the technical investigation. Directs analysts. Makes tool-level decisions (which queries to run, which systems to image, which logs to preserve). Reports findings to the IR Lead.

Communications Lead — Manages all internal and external communications. Drafts employee notifications, customer communications, and regulatory notifications. All communications are reviewed by Legal before release. No one else communicates externally about the incident.

Legal Liaison — Advises on legal obligations (notification deadlines, privilege, regulatory requirements). Determines if external counsel is needed. Reviews all external communications. Engages law enforcement if warranted.

Executive Sponsor — The senior leader (typically CISO, CIO, or CEO) who makes business decisions during the response: whether to pay a ransom (recommendation: no), whether to notify customers early, whether to engage external forensics, whether to involve law enforcement.

Section 4: The Response Phases

Six phases. Each phase has defined entry criteria (when it begins), actions (what the team does), exit criteria (when it ends), and handoff (what triggers the next phase).

Detection and Triage — An alert fires, a user reports something, or a third party notifies you. The on-call analyst evaluates the alert, determines if it is a true positive, classifies severity, and activates the appropriate response level.

Key actions: validate the alert against multiple data sources. A single sign-in from an unusual country might be a VPN — check for follow-on activity (mailbox access, file downloads, admin portal access). If the alert is a true positive, log it, classify it, notify the IR Lead, and begin evidence preservation immediately.

Containment — Stop the attack from spreading without destroying evidence. There are two types: short-term containment (isolate the endpoint, disable the account, block the IP) and long-term containment (rebuild credentials, patch the vulnerability, deploy additional monitoring).

Key actions: isolate affected endpoints using EDR (network isolation, not reimaging — you need the forensic evidence). Disable compromised accounts. Reset all sessions for affected users. Block known adversary infrastructure (IPs, domains) at the firewall and proxy. Do NOT reboot affected systems — memory evidence is lost on reboot.

Eradication — Remove the adversary’s access completely. This is not the same as containment — containment stops the bleeding, eradication removes the knife.

Key actions: identify all persistence mechanisms (scheduled tasks, registry run keys, OAuth application grants, mail forwarding rules, new accounts created by the adversary). Remove them all. If the adversary had administrative access, assume all credentials in the environment are compromised — plan a full credential rotation.

Recovery — Restore normal operations. Rebuild compromised systems from known-good images. Restore data from verified backups. Re-enable accounts with new credentials. Monitor the environment at heightened alert levels for 30 days — adversaries frequently re-enter through persistence mechanisms that were missed during eradication.

Key actions: restore in phases. Start with critical systems, verify integrity, monitor for 24-48 hours, then restore the next tier. Do not restore everything simultaneously — if eradication was incomplete, you need to be able to identify which restored system reintroduced the adversary.

Post-Incident Review — Conduct within 5 business days of recovery completion while memory is fresh. This is not a blame exercise — it is a learning exercise. Document: what happened, how it was detected, how it was contained, what worked, what did not work, and what changes are required.

Key outputs: updated detection rules, revised response procedures, identified gaps in logging or tooling, and management recommendations. The post-incident review is the mechanism by which your IR capability improves over time.

Documentation and Closure — Produce the formal incident report. Archive all evidence, logs, communications, and decisions. Update the risk register if the incident revealed a previously unidentified risk. Close the incident ticket.

Section 5: Evidence Handling

Evidence handling is the most commonly neglected section — and the one that causes the most problems if you need to involve law enforcement or external forensics.

Chain of custody: Every piece of evidence (disk images, memory dumps, log exports, screenshots) must be logged with: what it is, when it was collected, who collected it, where it is stored, and the hash value (SHA-256) at time of collection. If the hash changes, the evidence is compromised.

Preservation priorities (collect first, analyze later):

Volatile memory (RAM) — lost on reboot
Running processes, network connections, logged-in sessions — lost on reboot
System logs — may be overwritten by log rotation
Disk images — stable but should be collected before remediation changes the disk state
Cloud audit logs — typically retained 90-180 days, export immediately
Email and file metadata — retained by the platform but may be modified by the adversary

Storage: Evidence must be stored on write-once or access-controlled storage that the incident team does not have permission to modify after upload. A dedicated evidence bucket in cloud storage (S3, Azure Blob) with immutability policies enabled works. A shared network drive does not — anyone with access can modify the evidence.

Section 6: Communication Templates

Pre-draft the communications you will need during an incident. Under pressure, writing a customer notification from scratch takes hours and produces something that legal rejects. Pre-drafted templates with fill-in-the-blank fields (incident type, affected systems, timeline, remediation actions) are reviewed by legal in advance and can be deployed in minutes.

Minimum templates:

Internal notification to employees (what happened, what to do, what not to do)
Customer notification (what happened, what data was affected, what you are doing, what they should do)
Regulatory notification (GDPR Article 33 format, ICO template, or your applicable regulator’s format)
Executive briefing (one-page summary for board/C-suite: impact, status, next steps, decisions needed)
Law enforcement referral (if warranted — prepared statement, evidence summary, point of contact)

Testing the Plan

A plan that has not been tested is a document, not a capability. Test annually at minimum, quarterly if you can manage it.

Tabletop exercise: Walk the IRT through a scenario verbally. “It’s Tuesday at 2pm. Your SOC analyst detects 15 accounts authenticating from a Nigerian IP range that has never appeared in your logs. Three of those accounts have Global Admin roles. What do you do?” The team talks through their response using the plan. You identify gaps in roles, procedures, and decision authority.

Functional exercise: Execute the containment and communication procedures against a simulated incident in a test environment. Actually isolate a test endpoint. Actually draft and send the internal notification. Actually export the evidence to the evidence storage. This reveals practical failures that tabletop exercises miss — the EDR isolation command requires a permission the analyst does not have, the evidence storage bucket does not exist yet, the communications template references a phone tree that was last updated 18 months ago.

The Documentation You Need

A complete IR capability requires more than the plan itself:

Incident Response Plan — the strategic document covering classification, roles, phases, and authority
Attack-specific playbooks — step-by-step procedures for common attack types (ransomware, BEC, credential compromise, insider threat, DDoS)
Communication templates — pre-approved drafts for each audience (employees, customers, regulators, executives)
Evidence handling procedures — chain of custody forms, collection scripts, storage configuration
Contact roster — primary and backup for each IRT role, updated quarterly
Regulatory reference — notification deadlines, required content, submission methods for each applicable regulation

Building this documentation from scratch takes 80-120 hours of skilled labor. The Cyber Incident Response Toolkit provides the complete set — 36 deliverables including attack-specific playbooks, communication templates, a classification engine, and a readiness scorecard — structured to the same methodology described in this guide.

Document Customization

Need this customized to your organization?

You complete an intake form. We customize every document — industry context, regulatory mapping, calibrated parameters, risk pre-population. Delivered in 7–10 business days.

Learn More → Start Intake →

Foundation $1,997 · Compliance $3,497 · Product purchase separate

Need the skills to operate the program? Our training platform builds the capability — 9 courses at training.ridgelinecyber.com →

Related Training

Build the skills to implement what you just read

Practical IR

20 modules — investigate breaches end-to-end

SOC Operations

12 modules — detection engineering, triage, playbooks

Browse all 9 courses →

Get compliance insights and product updates

Product launches only · No spam · Unsubscribe anytime