In this section

Hunt Quality Assurance and Metrics

4-5 hours · Module 1 · Free
What you already know
Section 1.9 established the cadence model that keeps hunts happening on schedule. Protected time and backup coverage solve the scheduling problem. This section addresses the quality problem: how to prevent a single query error from becoming a documented false assurance that a technique was searched for and not found.

Scenario

Tom Ashworth completes a hunt for inbox rule manipulation by querying CloudAppEvents for New-InboxRule operations. Zero results. He documents the hunt as "refuted" and deploys a detection rule based on his query. Three weeks later, IR discovers an attacker who created inbox rules through the Graph API. Graph API inbox rule creation appears in MicrosoftGraphActivityLogs, not CloudAppEvents. Tom's hunt missed the attack entirely because his query covered one creation method and was blind to another. A five-minute peer review before the hunt started would have caught this: "Did you check all inbox rule creation paths, including Outlook, OWA, PowerShell, EWS, and Graph API?"

Why quality assurance matters for hunting

A hunt is only as good as its weakest query. A single misconfigured filter, a missed join condition, or a baseline contaminated by attacker activity can produce a false negative that masks a real compromise. False negatives from hunting are worse than no hunt at all because they create documented assurance that a technique was searched for and not found. Future analysts deprioritize that technique based on a false conclusion, and the compromise persists.

Quality assurance for hunting is not the same overhead as QA for software development. It costs twenty minutes per hunt. It prevents the specific class of errors that undermine the entire program's credibility.

THREE QA REVIEW POINTS FOR EVERY HUNT 1. BEFORE HUNTING Hypothesis + scope review All technique variants covered? All data sources included? ~5 minutes with a peer 2. BEFORE CLOSING Hunt record review All funnel steps completed? Conclusion supported by evidence? ~10 minutes reviewing the record 3. BEFORE DEPLOYING Detection rule review Exclusions justified? Threshold and mapping correct? ~5 minutes reviewing the rule Total QA overhead: ~20 minutes per hunt. Cost of missing an error: an undetected compromise.

Figure TH1.10 — Three review points for hunt quality assurance. Twenty minutes of QA prevents false negatives that undermine the entire program.

Review point 1: Before hunting

A second analyst reviews the hypothesis and scope definition for five minutes before the first query runs. Are all relevant data tables included? Does the scope cover technique variants (the inbox rule example in the scenario above)? Is the time window appropriate for the technique's expected dwell time? Is the population correctly defined? This five-minute investment catches scope gaps that would otherwise produce false negatives.

At NE, Priya catches a scope gap in Tom's hypothesis before a hunt for suspicious PowerShell execution. Tom scoped the hunt to DeviceProcessEvents, which captures process creation on Defender-enrolled endpoints. Priya asks whether the hypothesis should also include CloudAppEvents for Exchange Online PowerShell (remote sessions that do not touch managed endpoints) and AuditLogs for Azure AD PowerShell module usage. Tom adds both tables. Without the review, any attacker who ran PowerShell through Exchange Online Remote PowerShell or Azure AD would have been invisible to the hunt, and Tom would have documented "no suspicious PowerShell execution found" based on an incomplete scope.

Review point 2: Before closing

A second analyst reviews the hunt record before it is marked complete. Were all four funnel steps executed (orientation, indicator, enrichment, pivot)? Were false positives analyzed with specific justifications for each exclusion? Is the conclusion supported by the analysis? If the conclusion is "refuted," does the scope actually cover the technique, or could the technique have been present but invisible due to a data source gap? This review takes about ten minutes and catches analytical errors.

Pay particular attention to refuted conclusions. A confirmed finding has built-in validation because the compromise itself provides evidence that the hunt methodology worked. A refuted finding has no such validation. The only way to confirm that "not found" means "not present" rather than "not visible" is to verify that the scope, data sources, and query logic covered all relevant technique variants. Review point 2 exists primarily to validate negative findings, which carry the highest risk of false assurance.

Review point 3: Before deploying

A second analyst reviews the detection rule before it enters production. Does the rule's time window and frequency avoid detection gaps? Are exclusions documented with evidence from the false positive analysis? Could any exclusion be exploited by an attacker routing through an excluded IP range or masquerading as an excluded service account? Is the entity mapping correct? Is the severity appropriate? This review takes five minutes and catches configuration errors that would undermine the rule's effectiveness.

Exclusion review is the critical element at this checkpoint. Every exclusion added to a detection rule creates a blind spot that an attacker can potentially exploit. If the rule excludes traffic from a specific IP range (the corporate VPN), an attacker who routes through that VPN range bypasses the rule. If the rule excludes a service account, an attacker who compromises that service account operates undetected. Review point 3 forces the analyst to justify each exclusion and consider whether it creates an exploitable gap. Some exclusions are necessary (the service account that legitimately generates 200 sign-ins per day will overwhelm the SOC if not excluded), but each one should be documented with the specific false positive it prevents and the residual risk it introduces.

Solo hunter adaptations

Not every team has a second analyst available for peer review. Three adaptations serve solo practitioners.

Self-review with a structured checklist. Before closing a hunt, walk through a written checklist item by item. Check each honestly. A checklist compensates for the absence of a second perspective by forcing you to evaluate your own work against explicit criteria rather than relying on the feeling that "this looks right." At minimum, the checklist should cover: hypothesis specificity (does it name a technique, data source, and indicator?), scope completeness (are all relevant technique variants covered?), funnel progression (did you execute all four query steps?), conclusion validity (does the evidence support the stated outcome?), and conversion readiness (is the detection rule query ready for scheduled deployment?).

Time-delayed review. Complete the hunt. Wait 24 hours. Re-read the hunt record with fresh eyes. Overnight cognitive distance catches errors you missed during the hunting session because you were too close to the data. Fresh reads reveal assumptions you made unconsciously: a scope limitation you did not document, an enrichment dimension you skipped, a conclusion you reached before the evidence fully supported it.

Quarterly batch review. Every three months, review the last quarter's hunt records as a batch. Look for systematic patterns in your own work. Are you consistently missing certain data sources? Are your exclusions becoming more permissive over time? Are your conclusions clustering toward one outcome (always refuted, never confirmed)? Systematic patterns reveal biases that individual self-review misses. A quarter where every hunt concludes "refuted" may mean either excellent detection coverage or consistently narrow scoping.

Analyst Decision

Quarterly batch review findings (NE Q1 2026):

Hunts completed: 6 of 6 scheduled (biweekly cadence sustained)

Outcomes: 1 confirmed, 4 refuted, 1 inconclusive

Detection rules deployed: 4 (2 from confirmed, 2 from refuted)

Pattern observed: All 6 hunts scoped to identity tables (SigninLogs, AuditLogs). No hunts targeted endpoint telemetry (DeviceProcessEvents, DeviceNetworkEvents). ATT&CK coverage improvement concentrated in Initial Access and Persistence. Execution and Lateral Movement tactics remain unexamined.

Action: Q2 backlog rebalanced to include 2 endpoint-focused hypotheses targeting T1059 (Command and Scripting Interpreter) and T1021 (Remote Services).

Measuring hunting value

Leadership asks two questions about every security investment: "Is it working?" and "Is it worth the cost?" Hunting metrics must answer both in terms leadership understands, not in technical detail about query patterns and ATT&CK technique IDs.

Detection coverage delta. How many new detection rules has the hunting program produced? Each hunt that completes the full cycle from Section 1.6 produces at least one detection rule. Track the cumulative count and the ATT&CK techniques those rules cover. After 12 months, you can report: "Hunting added N detection rules covering N techniques that were previously unmonitored." This metric demonstrates coverage improvement in language leadership already understands: before hunting, the organization could not detect these techniques; after hunting, detection is automated.

Dwell time compression. When a hunt confirms a compromise, record the dwell time. In Section 1.8, the OAuth consent phishing hunt found a 43-day dwell time. Without the hunt, that compromise would have continued indefinitely. Report dwell time compression as cost avoidance: each day of undetected mailbox access represents data exposure risk that increases with duration. Organizations with hunting programs identify breaches faster. IBM's Cost of a Data Breach Report consistently shows that organizations with proactive threat detection reduce breach lifecycles significantly compared to those relying solely on automated alerting.

Hunt completion rate. Track scheduled hunts versus completed hunts. NE's Q1 failure (3 of 12 completed) demonstrated a scheduling problem, not a hunting problem. After the scheduling fix, Q2 achieved 6 of 6. Completion rate measures program discipline. A rate below 75% indicates that hunting time is not adequately protected, and the cadence model may need adjustment. Above 90% indicates healthy and sustainable operational scheduling.

Findings per hunt. Not every hunt should find a compromise. A program where every hunt confirms a finding suggests either excellent hypothesis targeting or insufficient detection coverage in every area examined. A healthy ratio is roughly 1 confirmed finding per 4-6 hunts, with the majority of hunts concluding "refuted" and producing detection rules that confirm coverage. Report both outcomes: "Of 18 hunts, 3 confirmed active compromises and 15 validated that specific techniques were not present, producing 14 detection rules."

Presenting metrics to leadership

Raw numbers need context. Rachel presents NE's hunting metrics quarterly using a format that connects each metric to a business outcome.

Hunting Hypothesis

NE Threat Hunting Program — Q2 2026 Quarterly Report

Completion rate: 6 of 6 scheduled hunts completed (100%). Biweekly cadence fully sustained with protected time model.

Detection coverage delta: +4 detection rules deployed (2 from confirmed findings, 2 from refuted findings). Cumulative program total: 15 rules covering 12 ATT&CK techniques across 4 tactics.

Dwell time compression: 1 confirmed compromise with 43-day dwell time identified and contained. Estimated cost avoidance: 43 days of undetected mailbox access terminated.

Findings ratio: 1 confirmed, 4 refuted, 1 inconclusive. Healthy distribution indicating targeted hypothesis selection with comprehensive negative documentation.

Program investment: 24 analyst-hours (6 hunts x 4 hours). Return: 4 permanent detection rules, 1 compromise contained, 6 documented technique assessments.

Notice the format. Each metric is stated as a fact, then immediately connected to what it means for the organization. Coverage delta becomes "12 techniques that were previously unmonitored." Dwell time becomes "43 days of undetected access terminated." Completion rate becomes "the scheduling model is working." Leadership does not need to understand KQL or ATT&CK to read this report and conclude that 24 hours of investment produced measurable, lasting security improvements.

Present findings per hunt as a ratio, not as a success/failure count. A quarter where no hunt confirmed a compromise is not a failure if those hunts produced detection rules and documented that specific techniques are not present in your environment. Negative findings have value: they prove coverage and provide documented evidence for compliance frameworks that require proactive threat assessment. Frame refuted hunts as "validated that these techniques are not active in our environment and deployed detection rules to ensure automated alerting if they appear in the future."

Measuring hunting by volume alone

Some programs report "number of hunts conducted" as their primary metric. Volume without outcomes tells leadership nothing about whether hunting is effective. Twelve hunts that each conclude "no finding" with no detection rules deployed and no documentation represent 48 hours of analyst time with zero lasting value. Six hunts that each produce a documented finding or a deployed detection rule represent 24 hours of time with permanent value. Report outcomes, not activity.

Threat Hunting Principle

Quality assurance is not overhead. It is the mechanism that prevents a single query error from becoming a documented false assurance that a technique was searched for and not present. Twenty minutes of peer review per hunt is cheaper than the cost of a false negative that masks a compromise for months.

Next
Section 1.11 summarizes Module 1, connecting the ten sections of the Hunt Cycle methodology into the operational framework you carry into the campaign modules that follow.
Unlock the Full Course See Full Course Agenda