In this section

Severity Classification and Escalation: Severity Is a Shape

5-6 hours · Module 8

The alert you could not grade

Go back to the alert that opened the course. It is 02:14. A service account that should run unattended has signed in interactively and is reading a file server it has never touched. You have to put a severity on it, now, because severity is what decides whether it waits in a queue until morning or wakes someone up. Reach for a single number off instinct and you will be wrong in one of two expensive directions. Call it low and an active compromise sits untouched while the attacker works the file server. Call it high a few times on alerts that turn out to be benign and the next real page gets ignored, because a team that is paged for everything learns to trust nothing. The skill the course promised to teach you is not nerve and it is not caution; it is reaching the defensible grade fast, from evidence. This module is where you learn the instrument that does it.

Under-rate and the compromise runs; over-rate and the team stops trusting the queue. The grade from evidence is the narrow path between them.

The instrument is the scorecard, the central instrument of the whole course, the one every prior module has pointed forward to when it said "graded in TR8." But before the eight questions and the scoring, there is a mental model you have to hold, because it is what makes the scorecard more than a number generator. A severity is not really a number. It is a shape. The total maps to a tier and tells you roughly how bad, but the distribution of the scores, which questions drove the total, tells you what kind of incident this is and what it needs first, and that second thing is often more actionable than the tier. Two incidents can score the same total and need opposite first moves. Grade by the number alone and you lose that, you know how loud the incident is but not what it is. This sub is the shape.

Same total, different incident

Take two incidents that both grade out to High. The first is the hybrid backdoor you triaged in TR7: a service principal with an organisation-wide mail-read grant, signing in quietly across days. Its score is driven by data exposure and standing access, the questions about what is reachable and how durable the foothold is score high, while the questions about active spread and immediate destruction score low, because nothing is being destroyed and it is not propagating, it is sitting on broad, quiet access. The second is an active-spread incident: a credential compromised and being used right now to move laterally across hosts, authentications firing against new machines minute by minute. Its score is driven by the live, spreading activity, the questions about whether it is happening now and how fast it is moving score high, while the data-exposure questions may score lower because you have not yet seen what was taken.

Both total High. Both deserve to wake someone. But they need opposite first moves, and the number does not tell you that, the shape does. The data-exposure incident needs you to scope and revoke the standing access before more mail is read, the urgency is about reach and persistence, and an hour's delay means more data exposed through a foothold that is not going anywhere on its own. The active-spread incident needs you to contain the propagation now, this minute, because every minute it spreads to another host, and scoping the full data picture can follow once the bleeding stops. Same tier, opposite first action, and the only thing that tells you which is the distribution of the scores. That is why severity is a shape: the tier sets the urgency, but the shape sets the response.

Same total, opposite profiles. The number tells you both are serious; only the shape tells you one needs revocation first and the other needs containment first.

Why a grade beats a gut call

The reason to run an instrument at all, rather than trusting an experienced analyst's instinct, is not that instinct is usually wrong, a good analyst's gut is often right. It is that a gut call cannot be checked, cannot be handed off, and cannot be defended. When the CISO asks at 3am why you woke them, "it felt bad" is not an answer; "it scored High, driven by confirmed data exposure and standing tenant access, here is the evidence for each" is. When you hand the incident to the next shift, your instinct does not transfer, but a scored card does, the next analyst sees exactly what you saw and why you graded it as you did. And when an auditor asks three months later why an incident was or was not escalated, a scored card is a record and a feeling is not. The scorecard turns the same judgment a good analyst would make into something repeatable, defensible, and portable, which is what makes a fast decision a trustworthy one.

This is also what protects you from the two failures the opening described. The instrument makes you score evidence rather than react to the loudest fact, so you are less likely to over-rate an incident because one alarming detail caught your eye, or under-rate one because it looked quiet, the quiet standing backdoor scores High on the card even though nothing about it feels urgent in the moment. And because the card has a fixed set of questions, it stops you from grading on whichever fact happens to be salient and forces you to consider all the dimensions, including the ones that are easy to miss at 02:14. The scorecard is not bureaucracy slowing you down; it is the thing that lets you go fast without going wrong, because it replaces the unreliable question "how bad does this feel" with the answerable question "what does the evidence score."

A good analyst's instinct is often right; the trouble is it cannot be checked, handed off, or defended. The card carries the same judgment in a form that can.

The common mistake

Grading an incident by its single loudest fact, or by the total alone, and losing the shape that tells you what to do. The loud-fact version: one alarming detail, a foreign sign-in, a known-bad IP, dominates the analyst's attention and drives the grade, while the dimensions that are quiet but decisive, standing access, confirmed data exposure, go unscored, so a serious incident gets under-rated because its worst quality was not the noisy one, or a minor one gets over-rated because its one alarming detail was.

The total-alone version is subtler: the analyst runs the card correctly, gets a tier, and acts on the tier without reading the distribution, treating two High incidents as the same response when one needs revocation first and the other containment first.

Both come from treating severity as a number rather than a shape.

The fix is to grade every dimension against evidence, not react to the salient one, and then to read the distribution before deciding the first move: which questions drove this total, and what does that profile need first. A high score from data exposure and a high score from active spread are the same tier and different incidents, and the analyst who acts on the number without the shape does the right-urgency wrong-action thing, contains a quiet backdoor while a spreading compromise keeps moving, or scopes data on a propagating incident while it takes more hosts.

Seeing it in the evidence

The two shapes, in the evidence: the standing backdoor that scores on reach and persistence against the active spread that scores on live propagation.

Where to find it

The shape of an incident is read from the same evidence the scorecard scores. For the standing backdoor, the service-principal sign-in history (AADServicePrincipalSignInLogs) shows the persistence, signing in across days, and the grant shows the reach, both high, while there is no spreading authentication to show. For an active-spread incident, the endpoint and identity logon evidence (DeviceLogonEvents, SigninLogs) shows authentications hitting new hosts in quick succession, the live propagation scoring high, while the data-exposure picture is not yet established. You are not scoring yet, that is 8.3; here you are reading which dimensions each incident loads onto, because that distribution is the shape.

SIEM Console

Read which dimensions an incident loads onto before you reduce it to a tier.
For the backdoor, query the sign-in history across days and the permission grant: the weight is on persistence and reach. For a spreading compromise, query the authentications across hosts over time: the weight is on live propagation. Two incidents can total the same and load onto opposite dimensions, and that distribution, not the total, is what tells you whether the first move is revoke or contain.

azure-aad-signin-sp sp_name="NE-Mail-Archiver"
| timechart span=1d count

AADServicePrincipalSignInLogs
| where ServicePrincipalName == "NE-Mail-Archiver"
| summarize signins = count(), days = dcount(bin(TimeGenerated, 1d)) by ServicePrincipalName

# The shape of the backdoor: persistence (days) and reach (the grant), not spread
Get-MgAuditLogSignIn -Filter "servicePrincipalName eq 'NE-Mail-Archiver'" |
  Group-Object { $_.createdDateTime.Date } | Select-Object Name, Count

Read the output

Predict before running. The panel returns the backdoor's sign-ins binned by day, and they spread across several days, with the grant behind them reaching mail organisation-wide. Read this as a shape, not yet a score.

The incident loads heavily onto two dimensions: standing access, it persists across days under its own power, and data exposure, the grant reaches the whole mail estate. It loads almost nothing onto active spread, there is no burst of authentications against new hosts, and almost nothing onto destruction, nothing is being deleted. That profile, heavy on reach and persistence, light on propagation, is the shape of a data-exposure incident, and its first move is to scope and revoke the standing access before more is read.

Now picture the contrast: an active lateral-movement incident would show the opposite panel, a cluster of authentications hitting new hosts over minutes, loading onto active spread and live activity while data exposure is still unknown, and its first move would be to contain the propagation immediately. Same potential tier, opposite shapes, opposite first moves. Run the panel and read the distribution: this is reach-and-persistence, which tells you what it needs before any number does.

Your turn

Two incidents land in your queue and both grade out to High. Incident A: a dormant-looking service account with a broad cloud data grant, signing in occasionally over the past week from an unusual location. Incident B: a user credential authenticating against six new internal hosts in the last ten minutes. With the same tier, why is reading the shape the deciding step, and what different first move does each need?

Reveal

Reading the shape is the deciding step because the tier tells you both incidents are serious enough to act on now, but it does not tell you what acting means, and the two need opposite first moves that the number alone would hide. The skill is to look past the total to the distribution of the scores, which questions drove each incident to High, because that distribution is the incident's character. Incident A loads onto data exposure and standing access: a broad data grant is reach, and an account signing in occasionally over a week is persistence, a foothold that is not going anywhere on its own but can read more every time it connects. It loads almost nothing onto active spread, it is not propagating, and the unusual location and dormant-account behaviour are the kind of quiet that an instinct grade would under-rate, which is exactly why the card scoring data exposure and standing access pulls it up to High despite feeling calm. Its first move follows from that shape: scope what the grant can reach and revoke the standing access, because the harm is continued and growing data exposure through a durable foothold, and an hour's delay means more data read, not lateral spread. Incident B loads onto active spread, live activity, and containment urgency: a credential hitting six new hosts in ten minutes is propagation happening right now, and every additional minute is another host. It may load lightly on data exposure because you have not yet seen what, if anything, was taken, but that low score does not lower the urgency, the urgency is the spread. Its first move is the opposite of A's: contain the propagation immediately, disable or isolate to stop the lateral movement this minute, and let the full data-scoping follow once the spread is halted. So with identical tiers, A needs revoke-and-scope and B needs contain-now, and the only thing that distinguishes them is the shape. An analyst who acted on the number alone would treat them the same and get one of them wrong, scoping data on the spreading incident while it takes three more hosts, or rushing to contain the quiet backdoor without first scoping the grant it has been using all week. The lesson is the sub's core: the tier sets the urgency, the shape sets the response, and severity is a shape, not a number.

Where this leads: you have the mental model, severity is a shape, the distribution tells you what the incident needs. The next sub builds the instrument that produces it: the eight questions of the scorecard, the fixed set you ask of every incident so the shape and the tier both come from evidence rather than instinct.

← Previous Next →