In this section

Signal vs Noise: False Positives, Alert Fatigue, and Tuning Detections

Module 0

Here is the problem that makes detection hard, stated as plainly as it can be. On any given day Northgate's accounts fail to sign in thousands of times, for reasons that have nothing to do with an attacker: a mistyped password, an expired session, a phone left at home, a VPN that dropped mid-authentication. The password spray from the previous sub also produces a flood of failed sign-ins.

The failed sign-ins from the attack and the failed sign-ins from ordinary life look almost identical at the level of a single event. If you write an alert that fires on failed sign-ins, it will fire all day, almost always on nothing.

That is the shape of every detection challenge you will face. The footprints an attacker leaves are, individually, the same footprints that thousands of harmless users and systems leave every hour. Detection is the work of separating the rare malicious activity from the overwhelming benign activity that resembles it, and doing so reliably enough that a human being is willing to act on the result.

This sub is about what that separation actually involves, the two distinct ways it can fail, and why getting it wrong quietly destroys a security team's effectiveness.

The estate is mostly noise

Start with a sense of scale, because it governs everything that follows. A firm the size of Northgate generates millions of events a day across identity, endpoints, network, web, and DNS. The number of those events that are part of a genuine attack, on a day when an attack is even happening, might be a few dozen. The ratio between the malicious and the benign is not close.

It is something like a handful of needles in a field of millions of pieces of straw, and on most days the field contains no needles at all.

This ratio is called the base rate, and it is the single most important number to keep in mind when you reason about detection. Because real attacks are so rare relative to normal activity, even a very accurate-sounding rule produces mostly false alarms. Suppose a detection correctly flags malicious sign-ins and is wrong only one time in a thousand on benign ones.

That sounds excellent until you remember there are millions of benign sign-ins and only a few malicious ones. One error in a thousand, applied to millions, is thousands of false alarms standing next to a few real ones. The rarity of the thing you want is working against you, and no amount of cleverness makes that arithmetic disappear. It can only be managed.

So the goal of a detection is never to look at one event and judge it. A single failed sign-in carries almost no information, because it is consistent with both an attack and a Tuesday. The goal is to find the pattern or the combination that is rare among the benign and common among the malicious.

The spray's signature, many accounts hit and almost all attempts failing, is exactly such a pattern: ordinary users do not generate it, attackers reliably do. Good detection is the search for those distinguishing shapes, and much of the work ahead is training your eye to recognize where they hide.

There is a second move that beats the base rate, and it ties straight back to the previous sub. A single weak signal, one failed sign-in, sits in a sea of identical benign ones. The conjunction of weak signals is far rarer.

Failed sign-ins followed by a success, followed within minutes by an unfamiliar process under that account, followed by a connection to an address with no history, is a combination that ordinary activity almost never produces, even though every piece of it is common on its own.

Correlating across sources reconstructs the attack story and, at the same time, sharpens the base rate, because the joint pattern is much rarer in the benign world than any of its parts. This is why the strongest detections often require several things to be true at once rather than leaning on one suspicious event.

A detection draws a line

Every detection, however it is built, comes down to drawing a line through that field and saying everything on one side is worth a human's attention and everything on the other is not. The moment you draw a line, you accept that you can be wrong in two completely different ways, and understanding the difference between them is foundational.

The first way is a false positive: the detection fires on something benign. The traveling salesperson signing in from an unusual country, the engineer running an unfamiliar admin tool for a legitimate reason, the batch job that authenticates a thousand times a minute by design. Each of these can trip a detection that was reaching for an attacker, and each wastes an analyst's time when it does.

The second way is a false negative: the detection stays silent on something malicious. The attack slipped under the line, either because it did not match the pattern or because the line was drawn to avoid the false positives and was set too conservatively to catch it. A false positive is a cost you can see, because someone has to chase it.

A false negative is a cost you usually cannot see, because by definition nothing told you it happened.

This asymmetry of visibility quietly distorts how teams behave. False positives are loud and immediate, so they generate constant pressure to make detections quieter. False negatives are silent, so nothing pushes back when tuning goes too far. A team that optimizes only against the errors it can see will drift, over months, toward detections that are pleasant to operate and blind to real attacks, with no one ever deciding that on purpose.

Part of the discipline is holding the silent error in mind and weighing it deliberately, precisely because it will never raise its hand the way a false positive does.

The line is a choice, and the two errors trade against each other. A benign event that fires is a false positive an analyst must chase; a malicious event that stays silent is a false negative no one is told about. Both are real costs, but only one of them announces itself.

The reason this matters so much is that the two errors trade against each other. You cannot simply reduce both at once by trying harder. Move the line to catch more attacks and you sweep in more benign events with them, so false positives rise. Move the line to cut the false positives and you set a higher bar that some real attacks will not clear, so false negatives rise.

Every detection lives somewhere on that trade, and deciding where is a judgment about which error is more affordable for this particular threat, on this particular source, for this particular team.

Example search illustration only, not run here

index=azuread sourcetype=azure:monitor:aad action=failure
| stats count AS failures BY user
| where failures > 50
| sort - failures

A detection is a line with a number in it. Set the threshold too low and ordinary accounts flood the queue; too high and a real attacker slips under. Finding where that line belongs is most of the work.

Example output illustration only, 2 results

user          failures
svc-monitor   144
p.sharma      68

A noisy detection is worse than silence

It is tempting to resolve the trade by always catching more, on the reasoning that a missed attack is the worst outcome, so you should err toward firing. In practice this instinct, followed without discipline, produces the most common failure in real security operations, and it is worth understanding why before you build anything.

Anti-Pattern

Firing on everything to be safe.

Under pressure the instinct is to make a detection trip on anything that might be the attack, on the logic that a miss is the worst outcome. Followed without discipline, this is how a SOC accumulates rules that fire dozens of times a day on nothing. The team learns to ignore them, and a detection that looks like coverage on a slide catches nothing in the room. What protects the estate is the alert a human still acts on, not the one a rule could in theory raise.

A detection that fires constantly on benign activity trains the people who receive it to ignore it. This is alert fatigue, and it is not a moral failing of lazy analysts. It is the predictable response of a finite human team to a stream of alarms that are almost always nothing.

When Tom and Priya on the Northgate SOC desk see the same noisy detection fire forty times a day and find nothing forty times, the forty-first fire gets a slower look, and the hundredth gets waved through. The detection is technically still working.

The humans have correctly learned that it is not worth their attention, and the one time in a thousand it is right, it is buried in the noise it generated on the other nine hundred and ninety-nine.

Coverage on paper is not coverage in practice. A detection only works if a human still acts on it, and a detection that cries wolf spends down the one resource it depends on, which is the analyst's willingness to look.

This is the hard lesson under the whole discipline. The value of a detection is not measured by how many attacks it could theoretically catch. It is measured by how many it catches that a human then acts on, and a detection nobody trusts catches nothing, however good its logic looks. A smaller number of detections that fire rarely and mean something will protect Northgate better than a wall of noisy rules that the SOC has quietly stopped reading. Restraint is a security control.

Tuning is the craft

Tuning is the work of moving a detection toward firing when it should and staying quiet when it should not, and it is where most of a detection engineer's real time goes. It is rarely about a cleverer idea. It is about understanding the benign activity well enough to carve it away from the malicious without carving away the attack at the same time.

In practice that means learning what normal looks like for the thing you are watching. The batch job that authenticates a thousand times a minute is not an attacker; once you know it exists, you can exclude it by name and your detection gets quieter without getting blinder. The engineer who legitimately runs admin tools can be accounted for.

The traveling executive who signs in from abroad can be distinguished from a foreign sign-in to an account that has never left Birmingham. Each of these is a piece of knowledge about the environment, turned into a refinement of the line.

Tuning is the steady accumulation of that knowledge, which is why a detection written by someone who knows the estate beats a generic rule pulled off a shelf, and why the modules ahead spend so much effort on knowing Northgate's normal before hunting its abnormal.

Tuning is also rarely finished. The environment keeps changing: a new application starts authenticating in a way that resembles an attack, a team adopts a tool that trips an old rule, the business opens an office in a country that used to be a red flag.

A detection that was well tuned a year ago drifts out of true as the normal around it moves, which is why detections are maintained rather than written once and filed away. Treating a detection as a living thing that takes feedback from every alert it raises is much of what separates a real detection practice from a folder of static rules.

There is judgment in how far to tune, and it comes back to the trade. For a low-severity, high-volume signal you will tune hard toward quiet, accepting that you miss some real instances because chasing every one would bury the team.

For a high-severity signal, ransomware staging, the deletion of backups, you will tolerate more false positives, because the cost of missing it once is catastrophic and a few wasted investigations is a price worth paying. The right setting is not a universal number. It is a decision tied to what the detection is for and what it would cost to be wrong in each direction.

What a detection you can trust looks like

Pull the threads together into a working definition you can carry forward. A good detection keys on a pattern that is rare in normal activity and characteristic of the threat, so it starts with a favorable base rate rather than fighting one. It is tuned with real knowledge of the environment, so the benign exceptions are accounted for instead of flooding the queue.

It fires at a rate the team can actually absorb, so that when it does fire, a human takes it seriously. And it carries enough context that the analyst can begin the investigation from the alert rather than starting from scratch, which is where the cross-source trail of the previous sub comes back in.

None of this is about writing a more complex query. It is about a way of thinking that treats the analyst's attention as the scarce resource it is, and treats every detection as a claim that something is worth spending that attention on. Hold that standard as you move into the technical modules. When you build your first real detections later in the course, the question will not be whether they can fire. It will be whether they earn the look they ask for.

That way of thinking only works if you genuinely know the environment you are defending, what is normal in it, where its evidence lives, and what each source can and cannot tell you. SPL0.6 turns to exactly that: the Northgate estate as a set of sources, and the kind of evidence each one holds before you ever run a query against it.

← Previous Next →