In this section

Detection-as-Code: Why Console-Managed Detection Fails

Module 0

Scenario

A credential-access detection stops firing. Nobody notices for three weeks. When someone finally investigates, they discover the rule's threshold was changed from 5 to 50 during a tuning session. The analyst who made the change left the company two weeks ago. There is no record of why the threshold was raised, no way to see what the rule looked like before, and no test that would have caught the broken detection.

That is not a hypothetical. It is the normal operating state of console-managed detection. The failure is not dramatic; it is quiet, gradual, and invisible until something real is missed.

The five failure modes

Console-managed detection breaks in predictable ways. Each failure mode is a gap in the system around the rule, and each gap is exactly what a specific pipeline stage replaces. The whole argument of this course fits in one picture: every weakness on the left is answered by a stage on the right.

The pipeline is not added complexity. Each stage exists because it closes a specific gap that makes console management fragile. The rest of this section walks each row left to right.

No history

A detection rule in a SIEM console has a "last modified" timestamp. Some platforms record the user who last saved it. None of them track the full change history: what the rule looked like three versions ago, what changed between version 2 and version 3, or the reasoning behind each change.

When a rule breaks, the investigation starts with "does anyone remember what this rule used to look like?" That question has no reliable answer in a console-managed environment. The prior state is gone. The analyst who made the change may have left the team. The best outcome is rebuilding the rule from memory; the worst is deploying a new rule that introduces a different set of false positives because the institutional knowledge of the original tuning is lost.

The cost compounds over time. A rule that has been tuned four times over two years has a history that lives entirely in the heads of the people who tuned it. If the team turns over, the history is gone. A new analyst inherits 180 rules with no context for why each one looks the way it does, and every question about the rule's rationale hits a dead end. This is not a failure of documentation culture; it is a failure of tooling. You cannot document what you do not have a mechanism to record.

In a Git repository, every version of the rule is a commit. The same broken-threshold incident from the scenario becomes a two-minute investigation instead of a three-week mystery:

When the new analyst inherits 180 rules, they inherit 180 complete histories. The "what did this look like before" question is a command, not a mystery.

No test

Console-managed detection is "tested" by one of two methods: the author runs the rule once in the console and confirms it produces results, or the author deploys the rule and waits to see if it fires on real traffic. Neither is a test in any engineering sense.

The first method proves the rule matches something in the current data. It does not prove the rule matches the right thing, or that it will not match the wrong thing. A rule for encoded PowerShell execution that matches on CommandLine contains "-enc" will fire in the console when you run it, because some process in the last 24 hours probably used an encoded command. But it will also fire on the legitimate configuration-management script that runs every morning with an encoded block, and the author will not know that until the SOC triages the alert.

The second method uses production as the test environment, which means the SOC is the test harness: if the rule fires on a false positive, an analyst wastes time triaging it; if the rule fails to fire on a true positive, nobody knows until an incident is missed. This is not testing. This is hoping, with the SOC absorbing the cost of every mistake.

The deeper problem is that console-based "testing" is not repeatable. Even if the author runs the rule once and confirms it matches the right event, that confirmation exists only in their memory. Nobody else can verify it. Nobody can rerun the test after a tuning change. The "test" is a one-time check that evaporates the moment it completes.

Anti-Pattern

"It fired in the console, so it works"

Confirming that a rule produces output is not testing. A rule that matches every PowerShell execution will "fire" when you run it, but it will also produce hundreds of false positives per day and drown the signal it was meant to detect. A real test asserts two things: the rule fires on a specific true-positive event, and it stays silent on a specific benign event. Without both assertions, you have not tested the rule; you have confirmed it runs.

Fixture-based testing replaces both methods. A committed true-positive event (the attack the rule should catch) and a committed benign event (the legitimate activity the rule should ignore) travel with the rule in the repository. The test runs automatically and asserts both outcomes. A rule that fires on benign activity fails the test before it reaches production. A rule that fails to fire on the true-positive event is caught the same way. And because the fixtures are files in the repo, the test can be rerun after every tuning change, by any team member, at any time.

No review

In most SIEM consoles, deploying a detection rule is a single action: Save, Enable, Create. The rule goes live the moment the author clicks the button. There is no review step. No second pair of eyes reads the detection logic, checks the scoping condition, evaluates the false-positive risk, or confirms the ATT&CK mapping before the rule enters production.

Console-managed detection has one step between writing and production. Pipeline-managed detection has three gates, each catching a different class of error.

This matters because detection rules have the same error categories as application code: logic errors (the scoping condition is too broad), configuration errors (the entity mapping is wrong), and judgment errors (the threshold is set to a value that produces noise). Code review catches these. A solo save-and-enable workflow does not.

Consider a concrete example. An analyst writes a rule to detect lateral movement via PsExec. The scoping condition checks for a new service installation with a service name matching the PsExec pattern. The rule works in the console. But the scoping condition does not exclude the legitimate remote-administration tool the IT team uses, which creates services with a similar naming pattern. A reviewer who knows the environment catches this in the PR and adds a suppression for the IT tool's service name. Without review, the rule goes live, fires 30 times a day on legitimate IT activity, and the SOC begins ignoring it. When real PsExec lateral movement happens, the alert is one of dozens the analyst has learned to dismiss.

A pull request replaces the save button with a review. The reviewer reads the Sigma rule, the test fixtures, the metadata, and the conversion output. They check whether the scoping condition is tight enough, whether the false-positive notes are realistic, and whether the ATT&CK mapping is correct. The rule reaches production after a deliberate merge, not after a click. The PR conversation is a permanent record: anyone who revisits the rule six months later can read why each scoping decision was made, what false positives were considered, and what the reviewer flagged.

No rollback

When a console-managed rule breaks, the fix is forward: edit the rule again and hope the new version is better. There is no "undo" button. There is no way to say "go back to what this rule looked like on Tuesday." The prior state is gone because there is no version history.

This creates a cascading problem. A tuning change that introduces a new false positive leads to another tuning change that narrows the rule too far, which leads to a missed detection, which leads to a rewrite from scratch. Each fix is a guess at what the rule should look like, because nobody can see what it looked like at any prior stable point. The rule drifts further from its original intent with each manual edit, and the team loses confidence in it.

In a Git repository, rollback is git revert. The prior version of the rule is a commit, and restoring it is a single command. The revert itself is a commit with its own history and message, so the rollback is documented the same way the original change was. This is not a theoretical capability; it is the standard operating procedure for any pipeline-managed change that causes a regression. When the on-call analyst says "the credential-access rule started producing false positives after the tuning change at 16:30," the response is a single git revert that restores the prior version, a push, and the deployment pipeline puts the known-good rule back in production. The bad change is documented, the revert is documented, and the team can investigate the tuning mistake at leisure instead of under pressure.

Coverage drift

Console-managed detection has no systematic way to answer "what do we cover?" The rules exist in the SIEM, each with its own configuration, and the only way to map them to ATT&CK techniques is a manual audit: open each rule, read its logic, decide which technique it detects, and record the mapping in a spreadsheet. That spreadsheet is stale the moment someone creates, modifies, or deletes a rule without updating it.

The manual audit is also unreliable. Different analysts may classify the same rule under different ATT&CK techniques. A rule that detects a behaviour spanning two techniques may be mapped to one or both depending on who does the audit. The spreadsheet becomes a record of opinions, not a record of coverage.

When each rule carries its ATT&CK mapping in its metadata (the tags field in a Sigma rule), the coverage map is a generated artifact. A script reads every rule in the repository, extracts the technique tags, and produces a coverage layer. The layer is as current as the last merge to the main branch. No spreadsheet. No manual audit. No drift. And because the mapping is reviewed as part of the PR, disagreements about technique classification are resolved before the rule merges, not discovered during a quarterly audit.

The cost of the gap

These five failure modes compound. A rule with no history is hard to debug. A rule with no test breaks silently. A rule with no review ships errors into production. A rule with no rollback stays broken until someone rewrites it. A rule set with no coverage map cannot tell you what it protects.

The result is a detection programme that looks active (rules exist, alerts fire) but operates blind (nobody knows whether the rules are correct, current, or complete). The gap between "we have detections" and "our detections are engineered, tested, and measured" is the gap this course closes. The failure is rarely a single catastrophic event; it is the slow accumulation of unverified rules, undocumented changes, and unmeasured coverage until the detection programme is a liability the team cannot reason about.

Section 0.3 introduces the source format that makes the pipeline possible: Sigma, the vendor-neutral detection language you write once and convert to any backend.

← Previous Next →

Reading width

Text size