Free Cheatsheet

Incident Response

Running the whole incident, not just the alert. The IR lifecycle end to end, prepare, detect, analyze, contain, eradicate, recover, learn, with the decisions and the discipline that hold up under pressure. No account needed.

The lifecycle Preparation Detection & analysis Evidence handling Containment Playbooks Communication Post-incident Quick lookup

Incident response is a process run under pressure, and the pressure is what breaks it: steps skipped, evidence destroyed by hasty containment, decisions made without scope. This reference is the lifecycle and the decisions at each phase, the structure that keeps an incident from becoming two. Analysis examples assume an M365 / Sentinel / Defender environment.

The IR lifecycle

The phases are sequential for a reason, but real incidents loop: analysis reveals more scope, which sends you back to contain again, and containment often surfaces evidence that reopens analysis. The discipline is to always know which phase you are in and resist the two standing temptations. The first is containing before you understand the scope, which tips off the attacker, destroys volatile evidence, and misses the footholds you have not found yet, so the attacker simply returns. The second is declaring victory before eradication is verified, recovering into an environment that is still compromised, which is how a single incident becomes a recurring one. The phase order is the guardrail, and pressure, the executive asking when it will be over, is exactly what makes people abandon it. Naming the current phase out loud, in the incident channel, is a simple discipline that keeps a team aligned on what they are and are not yet doing.

Phase	The goal
Preparation	The plan, team, access, and evidence sources ready before anything happens.
Detection & analysis	Confirm the incident, scope it: who, what, when, how far.
Containment	Stop the spread without destroying evidence or tipping the attacker.
Eradication	Remove the foothold, all of it, not just the visible symptom.
Recovery	Restore to known-good, monitored for return.
Post-incident	Lessons, metrics, the detection and control gaps that let it in.

Preparation

Everything you wish you had during an incident has to exist before it. The single biggest determinant of how an incident goes is what was prepared: whether the plan exists and has been exercised, whether the responders have the access they will need at 2am, and whether the evidence sources are actually logging. The worst time to discover that audit logging was off, or that nobody has the break-glass credentials, is mid-incident, which is exactly when most teams discover it.

Readiness	Confirmed before the incident
The plan	Written, role-assigned, and exercised in a tabletop, not a document nobody has read.
The team	Who responds, who decides, who communicates; on-call and reachable.
The access	Responder access and break-glass credentials that work when normal auth is compromised.
The evidence	Logging on and retained: sign-in, audit, mailbox, endpoint. No logs, no investigation.

Detection and analysis

Analysis turns an alert into an understood incident. The job is scope: identify the initial access, then follow the attacker forward (what they did next) and outward (what else they touched), until you can state with evidence who is affected and how far it spread. Scope wrong and you contain the wrong things, leaving footholds that bring the attacker straight back. In M365 the first moves are the sign-in story and the mailbox-rule check.


// TRIAGE: Check if the attacker's IP appears in other accounts
let suspiciousIP = "203.0.113.42";  // Replace with attacker IP from alert
SigninLogs
| where TimeGenerated > ago(7d)
| where IPAddress == suspiciousIP
| where ResultType == "0"
| summarize
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    SignIns = count(),
    Apps = make_set(AppDisplayName)
    by UserPrincipalName
| sort by FirstSeen asc

Pivot the attacker indicators across the environment, the IP, the account, the device, to find every place they appear, not just the one that alerted.


// FIRST 30 MIN: Inbox rules and forwarding for compromised user
let targetUser = "compromised.user@yourdomain.com";
OfficeActivity
| where TimeGenerated > ago(14d)
| where UserId =~ targetUser
| where Operation in ("New-InboxRule", "Set-InboxRule", "Enable-InboxRule",
    "Set-Mailbox", "Set-TransportRule", "New-TransportRule")
| extend RuleParams = tostring(Parameters)
| project TimeGenerated, Operation, ClientIP, RuleParams
| sort by TimeGenerated asc

Mailbox rules and forwarding are the first persistence to check on a compromised account, they survive a password reset and quietly continue the attacker's access.

Evidence handling

Evidence collected badly is evidence you cannot use, in a report, in a remediation decision, or in court. Two disciplines govern it: collect in order of volatility (the most ephemeral first, before it is gone), and preserve a chain of custody so the evidence is defensible later. The order of volatility is not pedantry, contain the host before you capture memory and the live network state, and you have destroyed the very evidence that proves what happened.

Order of volatility	Capture before it is gone
1. Memory & CPU state	RAM, running processes, network connections, gone at shutdown or containment.
2. Network state	Live connections, routing, sockets; volatile and time-sensitive.
3. Disk & logs	Filesystem, event logs, artifacts; durable but can be tampered.
4. Cloud / external logs	Sign-in, audit, mailbox; retained centrally but with their own retention limits.

Containment can destroy evidence. Sequence it. Isolating or rebuilding a host kills the volatile state, the memory, the live connections, the very things that show what the malware was doing. Where the incident may need defensible evidence, capture the volatile state first, then contain. Speed and evidence preservation are in tension; decide deliberately which the incident demands.

Containment and eradication

Containment stops the spread; eradication removes the cause. The containment decision is about timing and reversibility: short-term containment is fast and crude (isolate the host, disable the account) to stop active damage; long-term containment is the considered fix. The trap in eradication is treating the symptom, deleting the malware file while leaving the scheduled task that re-downloads it, removing the account while leaving the OAuth grant that re-establishes access. Eradicate every foothold the analysis found, then verify.

Step	The discipline
Short-term containment	Fast, reversible action to stop active harm: isolate host, revoke sessions, disable account.
Tip-off risk	Visible containment can alert the attacker to burn footholds or accelerate. Weigh covert vs decisive.
Eradication	Remove every foothold from the scope: persistence, accounts, grants, not just the obvious payload.
Verification	Confirm the foothold is gone and stays gone; recovery before this is recovery into a still-compromised environment.

Response playbooks

A playbook turns the lifecycle into specific steps for a specific incident type, so the response under pressure follows a tested path instead of being improvised at 2am by whoever is on call. The value is not the document, it is the thinking done in advance: the decisions, thresholds, and contacts worked out when no one is panicking. A playbook also encodes the type-specific traps, that ransomware containment is a race where you isolate first and investigate second, while account-compromise containment fails if you reset the password and stop, leaving the token and the OAuth grant alive. The common M365 incident types each have a shape worth pre-writing and exercising, because the first time you run a playbook should not be during the real thing.

Incident type	First move
Account compromise (AiTM)	Revoke sessions and reset; check token-surviving persistence (rules, apps, devices).
Business email compromise	Remove forwarding rules; check for fraudulent payments already in flight (time-critical).
Ransomware (pre-encryption)	Isolate immediately; the window before encryption is short, contain first.
Data exfiltration	Scope what left and to where; preserve evidence; assess notification obligations.

Communication and legal

The technical response is half the incident; the communication is the half that determines the organizational damage. Who is told, when, and in what words is a decision with legal weight, and notification obligations (regulators, customers, law enforcement) run on clocks that start at discovery. The failure modes are under-communicating (stakeholders blindsided) and over-communicating (speculation that becomes a liability). Decide the communication plan in preparation, not in the heat of the incident.

Audience	The consideration
Internal leadership	Facts and impact, on a cadence; decisions they need to make, when.
Legal & compliance	Notification obligations and their clocks; privilege over investigation comms.
Affected parties	Regulator, customer, and partner notifications, on their required timelines.
Law enforcement	When and whether to involve; coordinate, do not freelance.

Post-incident and metrics

The incident is not over when the attacker is out; it is over when you have learned from it. The post-incident review asks what let the attack in, what slowed the response, and what to change, and it feeds the gaps back into detection and preparation. Metrics make this concrete and comparable over time, an honest blameless review plus a few real measures is what turns one incident into a stronger response to the next.

Metric	Measures
MTTD (mean time to detect)	From attacker action to your detection. The dwell time you are trying to shrink.
MTTR (mean time to respond)	From detection to containment. How fast you act once you know.
Containment rate	Incidents contained before material impact.
Evidence preservation rate	Incidents where evidence was captured intact, the discipline measure.

The review is blameless or it is useless. A post-incident review that hunts for who to blame teaches everyone to hide what happened, which guarantees the next incident is handled worse. Focus on the system, the gap in detection, the unclear ownership, the missing log, not the person. The output is changed controls and changed process, not a name.

Worked example, account compromise through the lifecycle

An alert flags a risky sign-in. Analyze: the sign-in query shows MFA satisfied from an unfamiliar IP; pivoting the IP and account finds a new inbox rule and an OAuth consent, the scope is one account plus two persistence mechanisms. Contain: revoke sessions and reset (not just reset, the token survives that). Eradicate: remove the inbox rule and the OAuth grant, both of which would have re-established access after the reset. Recover & learn: confirm no further access, then ask why the AiTM sign-in was not blocked, and close that gap.

The discipline: scope before contain (or you miss the OAuth grant), capture evidence before you tear down, and feed the control gap back into preparation.

Quick lookup

Phase	Key action
Detect & analyze	Confirm, then scope: initial access, forward (next actions), outward (what else touched).
Collect evidence	Order of volatility: memory and network state before disk and cloud logs.
Contain	Short-term to stop harm; capture volatile evidence first if it is needed.
Eradicate	Every foothold in scope, persistence and grants, not just the payload; then verify.
Recover	Restore to known-good, monitored for the attacker's return.
Post-incident	Blameless review, MTTD/MTTR/containment metrics, feed gaps back to detection.

From the lifecycle to running the incident

This cheatsheet is the process in outline. Practical IR teaches the whole discipline: building the capability, the analysis and forensics, containment and eradication, the response playbooks, the legal and communication, and the simulations that prove your team can do it.

Explore the course

Weekly security engineering insights

Detection techniques, architecture patterns, and operational judgment, every Tuesday.

No spam. Unsubscribe anytime.