Incident Response Cheatsheet
Running the whole incident, not just the alert. The IR lifecycle end to end, prepare, detect, analyze, contain, eradicate, recover, learn, with the decisions and the discipline that hold up under pressure. No account needed.
Incident response is a process run under pressure, and the pressure is what breaks it: steps skipped, evidence destroyed by hasty containment, decisions made without scope. This reference is the lifecycle and the decisions at each phase, the structure that keeps an incident from becoming two. Analysis examples assume an M365 / Sentinel / Defender environment.
The IR lifecycle
The phases are sequential for a reason, but real incidents loop: analysis reveals more scope, which sends you back to contain again, and containment often surfaces evidence that reopens analysis. The discipline is to always know which phase you are in and resist the two standing temptations. The first is containing before you understand the scope, which tips off the attacker, destroys volatile evidence, and misses the footholds you have not found yet, so the attacker simply returns. The second is declaring victory before eradication is verified, recovering into an environment that is still compromised, which is how a single incident becomes a recurring one. The phase order is the guardrail, and pressure, the executive asking when it will be over, is exactly what makes people abandon it. Naming the current phase out loud, in the incident channel, is a simple discipline that keeps a team aligned on what they are and are not yet doing.
| Phase | The goal |
|---|---|
| Preparation | The plan, team, access, and evidence sources ready before anything happens. |
| Detection & analysis | Confirm the incident, scope it: who, what, when, how far. |
| Containment | Stop the spread without destroying evidence or tipping the attacker. |
| Eradication | Remove the foothold, all of it, not just the visible symptom. |
| Recovery | Restore to known-good, monitored for return. |
| Post-incident | Lessons, metrics, the detection and control gaps that let it in. |
Preparation
Everything you wish you had during an incident has to exist before it. The single biggest determinant of how an incident goes is what was prepared: whether the plan exists and has been exercised, whether the responders have the access they will need at 2am, and whether the evidence sources are actually logging. The worst time to discover that audit logging was off, or that nobody has the break-glass credentials, is mid-incident, which is exactly when most teams discover it.
| Readiness | Confirmed before the incident |
|---|---|
| The plan | Written, role-assigned, and exercised in a tabletop, not a document nobody has read. |
| The team | Who responds, who decides, who communicates; on-call and reachable. |
| The access | Responder access and break-glass credentials that work when normal auth is compromised. |
| The evidence | Logging on and retained: sign-in, audit, mailbox, endpoint. No logs, no investigation. |
Detection and analysis
Analysis turns an alert into an understood incident. The job is scope: identify the initial access, then follow the attacker forward (what they did next) and outward (what else they touched), until you can state with evidence who is affected and how far it spread. Scope wrong and you contain the wrong things, leaving footholds that bring the attacker straight back. In M365 the first moves are the sign-in story and the mailbox-rule check.
// TRIAGE: Check if the attacker's IP appears in other accounts
let suspiciousIP = "203.0.113.42"; // Replace with attacker IP from alert
SigninLogs
| where TimeGenerated > ago(7d)
| where IPAddress == suspiciousIP
| where ResultType == "0"
| summarize
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
SignIns = count(),
Apps = make_set(AppDisplayName)
by UserPrincipalName
| sort by FirstSeen asc
Pivot the attacker indicators across the environment, the IP, the account, the device, to find every place they appear, not just the one that alerted.
// FIRST 30 MIN: Inbox rules and forwarding for compromised user
let targetUser = "compromised.user@yourdomain.com";
OfficeActivity
| where TimeGenerated > ago(14d)
| where UserId =~ targetUser
| where Operation in ("New-InboxRule", "Set-InboxRule", "Enable-InboxRule",
"Set-Mailbox", "Set-TransportRule", "New-TransportRule")
| extend RuleParams = tostring(Parameters)
| project TimeGenerated, Operation, ClientIP, RuleParams
| sort by TimeGenerated asc
Mailbox rules and forwarding are the first persistence to check on a compromised account, they survive a password reset and quietly continue the attacker's access.
Evidence handling
Evidence collected badly is evidence you cannot use, in a report, in a remediation decision, or in court. Two disciplines govern it: collect in order of volatility (the most ephemeral first, before it is gone), and preserve a chain of custody so the evidence is defensible later. The order of volatility is not pedantry, contain the host before you capture memory and the live network state, and you have destroyed the very evidence that proves what happened.
| Order of volatility | Capture before it is gone |
|---|---|
| 1. Memory & CPU state | RAM, running processes, network connections, gone at shutdown or containment. |
| 2. Network state | Live connections, routing, sockets; volatile and time-sensitive. |
| 3. Disk & logs | Filesystem, event logs, artifacts; durable but can be tampered. |
| 4. Cloud / external logs | Sign-in, audit, mailbox; retained centrally but with their own retention limits. |
Containment and eradication
Containment stops the spread; eradication removes the cause. The containment decision is about timing and reversibility: short-term containment is fast and crude (isolate the host, disable the account) to stop active damage; long-term containment is the considered fix. The trap in eradication is treating the symptom, deleting the malware file while leaving the scheduled task that re-downloads it, removing the account while leaving the OAuth grant that re-establishes access. Eradicate every foothold the analysis found, then verify.
| Step | The discipline |
|---|---|
| Short-term containment | Fast, reversible action to stop active harm: isolate host, revoke sessions, disable account. |
| Tip-off risk | Visible containment can alert the attacker to burn footholds or accelerate. Weigh covert vs decisive. |
| Eradication | Remove every foothold from the scope: persistence, accounts, grants, not just the obvious payload. |
| Verification | Confirm the foothold is gone and stays gone; recovery before this is recovery into a still-compromised environment. |
Response playbooks
A playbook turns the lifecycle into specific steps for a specific incident type, so the response under pressure follows a tested path instead of being improvised at 2am by whoever is on call. The value is not the document, it is the thinking done in advance: the decisions, thresholds, and contacts worked out when no one is panicking. A playbook also encodes the type-specific traps, that ransomware containment is a race where you isolate first and investigate second, while account-compromise containment fails if you reset the password and stop, leaving the token and the OAuth grant alive. The common M365 incident types each have a shape worth pre-writing and exercising, because the first time you run a playbook should not be during the real thing.
| Incident type | First move |
|---|---|
| Account compromise (AiTM) | Revoke sessions and reset; check token-surviving persistence (rules, apps, devices). |
| Business email compromise | Remove forwarding rules; check for fraudulent payments already in flight (time-critical). |
| Ransomware (pre-encryption) | Isolate immediately; the window before encryption is short, contain first. |
| Data exfiltration | Scope what left and to where; preserve evidence; assess notification obligations. |
Communication and legal
The technical response is half the incident; the communication is the half that determines the organizational damage. Who is told, when, and in what words is a decision with legal weight, and notification obligations (regulators, customers, law enforcement) run on clocks that start at discovery. The failure modes are under-communicating (stakeholders blindsided) and over-communicating (speculation that becomes a liability). Decide the communication plan in preparation, not in the heat of the incident.
| Audience | The consideration |
|---|---|
| Internal leadership | Facts and impact, on a cadence; decisions they need to make, when. |
| Legal & compliance | Notification obligations and their clocks; privilege over investigation comms. |
| Affected parties | Regulator, customer, and partner notifications, on their required timelines. |
| Law enforcement | When and whether to involve; coordinate, do not freelance. |
Post-incident and metrics
The incident is not over when the attacker is out; it is over when you have learned from it. The post-incident review asks what let the attack in, what slowed the response, and what to change, and it feeds the gaps back into detection and preparation. Metrics make this concrete and comparable over time, an honest blameless review plus a few real measures is what turns one incident into a stronger response to the next.
| Metric | Measures |
|---|---|
| MTTD (mean time to detect) | From attacker action to your detection. The dwell time you are trying to shrink. |
| MTTR (mean time to respond) | From detection to containment. How fast you act once you know. |
| Containment rate | Incidents contained before material impact. |
| Evidence preservation rate | Incidents where evidence was captured intact, the discipline measure. |
An alert flags a risky sign-in. Analyze: the sign-in query shows MFA satisfied from an unfamiliar IP; pivoting the IP and account finds a new inbox rule and an OAuth consent, the scope is one account plus two persistence mechanisms. Contain: revoke sessions and reset (not just reset, the token survives that). Eradicate: remove the inbox rule and the OAuth grant, both of which would have re-established access after the reset. Recover & learn: confirm no further access, then ask why the AiTM sign-in was not blocked, and close that gap.
The discipline: scope before contain (or you miss the OAuth grant), capture evidence before you tear down, and feed the control gap back into preparation.
Quick lookup
| Phase | Key action |
|---|---|
| Detect & analyze | Confirm, then scope: initial access, forward (next actions), outward (what else touched). |
| Collect evidence | Order of volatility: memory and network state before disk and cloud logs. |
| Contain | Short-term to stop harm; capture volatile evidence first if it is needed. |
| Eradicate | Every foothold in scope, persistence and grants, not just the payload; then verify. |
| Recover | Restore to known-good, monitored for the attacker's return. |
| Post-incident | Blameless review, MTTD/MTTR/containment metrics, feed gaps back to detection. |
From the lifecycle to running the incident
This cheatsheet is the process in outline. Practical IR teaches the whole discipline: building the capability, the analysis and forensics, containment and eradication, the response playbooks, the legal and communication, and the simulations that prove your team can do it.
Explore the course