In this section

TH0.4 Why Detection Engineering Is Not Enough

3-4 hours · Module 0 · Free
What you already know

Section 0.3 introduced the detection pyramid: three layers of threat visibility requiring three operational disciplines. Detection engineering covers the base (known-known threats). Hunting covers the middle (known-unknown threats). Anomaly detection covers the apex (unknown-unknown threats). You know that most SOCs fund only the base layer and that 79% of ATT&CK techniques have no corresponding SIEM rule in the average enterprise. This section explains why — the five structural limitations that make detection engineering insufficient regardless of investment level.

Scenario

Northgate's detection engineering lead proposes a plan to close the coverage gap identified in Section 0.1. The plan: hire two additional detection engineers, double the analytics rule count from 142 to 280 over the next year, and target 40% ATT&CK technique coverage. Rachel Okafor approves the plan — it is a genuine improvement. But she asks the question the plan does not address: "What happens to the 60% that still has no rule? And what about the attacker who reads our rules on GitHub and engineers the evasion before they attack?" The detection engineering plan will make the base of the pyramid wider. It will not make the middle and top any less empty.

The limitations are structural, not staffing

If detection engineering's shortcomings were a resource problem — not enough engineers, not enough time, not enough budget — the answer would be straightforward: hire more, spend more, build more. But the limitations are architectural. They are properties of how detection rules work, and they persist regardless of investment. This is not an argument against detection engineering. It is an argument for understanding what detection engineering can do, what it structurally cannot do, and what fills the gap.

Limitation 1 — Rules encode anticipation

A detection rule is a statement of anticipation. "If an attacker does X, then the data will contain Y, and this query will find it." The rule works when the attacker does exactly what the rule author anticipated. The rule fails silently when the attacker does something the author did not anticipate — even if that something is a minor variation of the anticipated technique.

Consider a Sentinel analytics rule designed to detect inbox rule creation for email forwarding. The rule monitors for New-InboxRule operations with ForwardTo or RedirectTo actions. A BEC operator who knows this detection exists — because it is published in Microsoft's Sentinel community rules on GitHub — uses MoveToFolder instead, redirecting financial emails to the RSS Feeds folder where the legitimate user never looks. Same objective, same ATT&CK technique (T1564.008), same attack category. But the rule does not fire because the action type does not match the author's specification.

Researchers at the Fraunhofer Institute and USENIX analysed 292 publicly available and widely deployed SIEM rules and found that at least 44% could be evaded using straightforward modification techniques. Five categories of evasion — character case changes, field value substitution, process path manipulation, argument reordering, and equivalent command alternatives — were sufficient to bypass nearly half the rules studied. These were not exotic zero-day evasions. They were variations that a moderately skilled attacker could apply by reading the rule and changing one parameter.

Attackers do read the rules. Offensive tool developers specifically study published detection logic and engineer their tools to avoid the exact patterns those rules match. EvilGinx developers modified their AiTM proxy to avoid triggering Microsoft's anomalousToken risk detection. PowerShell obfuscation frameworks evolve specifically to bypass AMSI detection patterns. Cobalt Strike operators configure malleable C2 profiles to mimic the network signatures of legitimate cloud services. The detection rule is published; the evasion is engineered against it.

Limitation 2 — Rules require ingested telemetry

A detection rule can only query data that exists in your SIEM. If the telemetry is not ingested, the rule cannot fire regardless of how well the query is written.

Most M365 environments ingest a subset of available telemetry into Sentinel. The decisions about what to ingest are driven by cost (every GB/day carries a cost), compliance requirements (certain logs must be retained), and familiarity (the team prioritises the logs they already know how to use). The result is blind spots that detection rules cannot overcome.

Common examples in M365 environments: ServicePrincipalSignInLogs records application authentication — without it, a compromised application credential is invisible. MicrosoftGraphActivityLogs, introduced in 2024, records every Graph API call — without it, attackers who access mailbox data, enumerate the directory, or exfiltrate files through the API leave no queryable trace. CloudAppEvents with Defender for Cloud Apps captures OAuth consent, file sharing, and SaaS application activity — without it, the consent phishing hypothesis from Section 0.3 has no data to query. The telemetry exists. Microsoft records it. The organisation chose not to ingest it. Every detection rule targeting those data sources is blind by architecture.

KQL
// Identify detection rules querying tables that are not ingested
// Each result is a blind rule — deployed but unable to fire
let IngestedTables = Usage
    | where TimeGenerated > ago(30d)
    | where Quantity > 0
    | distinct DataType;
SentinelAudit
| where SentinelResourceType == "Analytic Rule"
| extend RuleData = parse_json(ExtendedProperties)
| extend RuleName = tostring(RuleData.RuleName)
| extend RuleQuery = tostring(RuleData.Query)
| where isnotempty(RuleQuery)
| mv-expand TableRef = extract_all(@"(\w+)\s*\n?\s*\|", RuleQuery)
| extend ReferencedTable = tostring(TableRef[0])
| where ReferencedTable !in (IngestedTables)
| summarize BlindRules = make_set(RuleName) by ReferencedTable
| extend BlindRuleCount = array_length(BlindRules)

Every rule in the output is deployed, consuming Sentinel evaluation cycles, yet structurally unable to fire because the table it queries contains no data. CardinalOps found that 13% of enterprise SIEM rules are non-functional for exactly this reason — misconfigured data sources or missing log fields that render the rule inert. The detection engineering dashboard shows it as "active." The attacker experiences it as absent.

Limitation 3 — False positive economics

Every detection rule operates at a point on the sensitivity-specificity tradeoff. Increase sensitivity (catch more true positives) and you also increase false positives. Increase specificity (reduce false positives) and you miss more true positives. There is no rule configuration that is simultaneously maximally sensitive and maximally specific — this is a mathematical constraint, not an engineering shortcoming.

In practice, detection engineers tune toward specificity because false positives carry an immediate operational cost. A rule that fires 50 times per day with a 2% true positive rate generates 49 wasted investigations for every real detection. The SOC lead asks for the rule to be tuned. The detection engineer raises the threshold, adds exclusions, or narrows the scope until the false positive rate drops to an operationally acceptable level. That tuning is correct. But every exclusion is a potential hiding place. Every raised threshold is a volume below which the attacker can operate undetected. The attacker who exfiltrates 95 files from SharePoint when the rule threshold is 100 passes beneath the detection floor — not because the rule is broken, but because the threshold was set to accommodate the legitimate users who download 80 files before a business trip.

Hunting does not operate on this tradeoff because hunting does not fire automated alerts. A hunting query that returns 500 results does not generate 500 incidents. It generates a dataset that a human analyst reviews, enriches with context, and makes judgments about. The analyst can tolerate a noisy result set because they are investigating, not triaging an alert queue. They examine the 500 results, identify the 3 that are suspicious based on contextual factors — user role, time of day, recent sign-in anomaly — and investigate those in depth. The 497 legitimate results are not wasted alerts; they are context that helps the analyst understand what normal looks like. Hunting can afford to operate at sensitivity levels that would be operationally destructive as automated detections.

Limitation 4 — Rules are static

A detection rule is authored at a specific point in time against a specific understanding of the technique. It does not adapt as the technique evolves, as the environment changes, or as the attacker adjusts their approach. No mechanism in Sentinel or Defender XDR allows a detection rule to evaluate its own effectiveness. A rule that has not fired in 12 months cannot tell you whether it is a well-targeted detection for a rare event or a broken rule that would not fire even if the attack occurred.

KQL
// Detection rules that have not fired in 90 days — decay candidates
let FiredRules = SecurityAlert
    | where TimeGenerated > ago(90d)
    | where ProviderName == "ASI Scheduled Alerts"
    | distinct AlertName;
SentinelAudit
| where SentinelResourceType == "Analytic Rule"
| where Status == "Success"
| extend RuleName = tostring(
    parse_json(ExtendedProperties).RuleName)
| where RuleName !in (FiredRules)
| distinct RuleName
| sort by RuleName asc

Each rule in this output is a candidate for validation through hunting. For each one, a hunter asks: does this rule detect what it claims to detect? Is the technique it targets still relevant? Has the technique evolved past what the rule specifies? Running the underlying query manually, with broader parameters than the rule's threshold allows, is a hunt. It tests the rule by looking for the technique the rule was supposed to catch — and it may find that the technique has moved on while the rule stayed still.

Limitation 5 — Alerts lack investigation context

When a detection rule fires, it produces an alert containing the fields that matched the rule's condition. It does not explain why the match matters, what the surrounding activity looks like, or whether the matched event is part of a larger attack sequence. The analyst receives a signal and must build the context from scratch — querying adjacent tables, correlating timestamps, examining the user's recent activity, and establishing whether the alert is an isolated event or a stage in a multi-step intrusion.

This is not a design flaw. It is a fundamental property of automated pattern matching. A rule that matches "inbox rule created with ForwardTo parameter" cannot also answer "was this the same user who registered a new MFA method an hour ago?" or "did this user's sign-in originate from an AiTM proxy?" Those questions require querying different tables, joining on different fields, and applying reasoning that the rule was never designed to perform.

Hunting operates in the opposite direction. A hunt begins with a hypothesis about a multi-stage attack pattern and queries for the entire sequence. A hunt for BEC persistence does not look for inbox rule creation in isolation. It looks for the chain: anomalous sign-in, MFA registration within 24 hours, inbox rule creation within 48 hours, and mailbox access from the same session. The hunt produces findings with built-in context because the hypothesis defined the context from the start.

Analyst Decision

Assessment: Northgate's detection engineering plan to reach 280 rules and 40% ATT&CK coverage will meaningfully improve the known-known layer. It does not address the structural limitations identified in this section.

Limitation exposure: Of the 142 existing rules, an estimated 13% may be non-functional (telemetry gaps). At least 44% of the remainder are vulnerable to straightforward evasion techniques if the attacker reads the published rule logic. Rules tuned for specificity create a detection floor below which attacker activity passes unnoticed. No rule self-evaluates — without active testing, detection decay is invisible.

Recommendation: Approve the detection engineering plan. Supplement it with a parallel hunting capability that performs three functions: (1) investigates the known-unknown layer the rules do not cover, (2) validates existing rules by hunting for the techniques they claim to detect, and (3) produces new detection rules from hunt findings, converting known-unknowns to known-knowns. The two capabilities are complementary — detection engineering builds the base; hunting tests it, extends it, and fills the middle layer.

The complementary relationship

Detection engineering and hunting serve different layers of the pyramid and face different constraints. Detection engineering scales through automation but is limited by anticipation, telemetry, and specificity tradeoffs. Hunting scales through human reasoning but is limited by analyst time and expertise. Neither replaces the other.

The mature operation funds both because they produce different outputs. Detection engineering produces rules that run 24/7 without human intervention, catching known-known threats at scale. Hunting produces findings that require human judgment, catching known-unknown threats that rules structurally miss. When hunting finds something, it feeds the detection engineering pipeline — the finding becomes a new rule. When detection engineering deploys a new rule, it eventually decays, and hunting validates whether the rule still works. The cycle is continuous: hunt → find → detect → decay → hunt again.

An operation that defunds hunting to hire more detection engineers is making a rational-seeming decision with a structural flaw. It is widening the base of the pyramid while leaving the middle and top empty. The additional rules will catch more of what the previous rules already targeted. They will not catch what rules structurally cannot catch.

Treating the detection gap as a staffing problem

"If we had more detection engineers, we would not need threat hunting." More detection engineers produce more rules, which improves coverage of the known-known layer. That is valuable. But more rules still encode anticipation, still require ingested telemetry, still trade sensitivity for specificity, still decay over time, and still produce alerts without investigation context. A detection engineering team of 50 with 1,000 rules still has a known-unknown layer that no rule covers. The limitations are properties of the method, not the team size. Hunting is the operational discipline that addresses what detection engineering structurally cannot.

Threat Hunting Principle

Detection engineering has five structural limitations: rules encode anticipation, require ingested telemetry, trade sensitivity for specificity, decay without self-evaluation, and produce alerts without investigation context. These are properties of the method, not the team. Hunting is the complementary discipline — it investigates what rules cannot anticipate, validates what rules claim to detect, and produces new rules from its findings.

Next

Section 0.5 — The Threat Landscape Driving Demand. You understand the structural limitations of detection engineering. The next section maps the current threat landscape to those limitations — the specific attacker tradecraft in M365 environments that exploits the detection gap, and the industry data showing why hunting demand is accelerating.