In this section
Converting Hunts to Detection Rules
Scenario
NE's identity compromise hunt discovered j.martinez was compromised via AiTM session hijacking. IR contained the account, removed the inbox rules, and revoked OAuth consents. Three months later, the same technique compromises a different user. No detection rule was created from the first hunt. Tom Ashworth's query chain sits in his Sentinel bookmarks, producing no value until someone manually runs it again. That hunt found the technique once. Without conversion, the organization must find it manually every time.
The hunt already did the hard work
Detection engineers typically write rules from threat reports: read about a technique, write a query based on theoretical understanding, deploy, and discover over the following weeks that the false positive rate is unacceptable because the query matches legitimate patterns the engineer did not anticipate. Tuning cycles begin. Rules are disabled, modified, re-enabled, modified again. Weeks pass before the rule stabilizes.
Hunt-derived detection rules skip this cycle. Your hunt already ran the query against real data. Twenty-five legitimate results from the indicator query are documented with specific reasons why each was legitimate. Exclusions needed to suppress those false positives are known before the rule is deployed. Detection rules ship with informed thresholds and exclusions from day one.
This is why conversion is the step that makes hunting self-funding. What you hunted today, you detect automatically tomorrow. Each technique moves from "known-unknown" (we know we should look for it) to "known-known" (we have automated coverage) permanently.
What changes between a hunt query and a detection rule
Core KQL logic is preserved. But several parameters must be adapted for scheduled, unattended execution.
Time window. The hunt query examined a 30-day window interactively. Detection rules run on a schedule, typically every hour or every 5 minutes for NRT rules. The rule's lookback window must match its frequency: a rule running every hour should look back 1 to 2 hours (with overlap to avoid timing gaps). A rule with queryPeriod: 14d is the maximum supported by Sentinel for scheduled rules. Behavioral baseline rules that need longer lookback periods should pre-compute the baseline in a watchlist or saved function rather than recalculating on every run.
Thresholds. The hunt query returned all users with new IPs. The detection rule needs a threshold that separates "worth alerting on" from "expected noise." The hunt analysis provides exactly this information. If the hunt found 28 users with new IPs, of which 25 were VPN changes and 3 had correlated suspicious signals, the rule should require a second indicator (MFA registration, inbox rule creation, or elevated risk score) alongside the new IP. That correlation reduces the alert volume from 28 to 3 without losing the true positives.
At NE, Priya Sharma's identity compromise hunt found that requiring both a new IP and at least one secondary signal (MFA registration, inbox rule, or OAuth consent within 24 hours) reduced the candidate set from 28 to 3 while preserving all true positives. She encoded this correlation as the rule's threshold: the rule fires only when SigninLogs shows a new IP AND AuditLogs shows a directory or mailbox modification within the same 24-hour window. A single-indicator version (new IP only) would have generated approximately 4 alerts per day. Correlated versions generate 0 to 1 alert per week, and nearly every alert is a genuine finding or at minimum worth investigating.
Entity mapping. Sentinel analytics rules map entities (account, IP, host, URL) so that incidents automatically link to entity pages in Defender XDR. Hunt queries often lack explicit mapping because the hunter worked interactively. Add extend statements that extract Sentinel entity identifiers:
Alert grouping. Multiple detections for the same user within a time window should group into a single incident. Configure entity-based alert grouping on the account entity with a window that matches the technique's expected activity pattern (4 to 24 hours for authentication-based detections). Without grouping, three anomalous sign-ins from the same compromised account create three separate incidents that must be triaged individually.
Severity. Map the rule's severity to the confidence model from Section 1.4. A rule that requires 3+ correlated indicators warrants High severity. A single-indicator rule requiring analyst enrichment warrants Medium or Informational. Match severity to the analytical confidence of the detection, not to the theoretical impact of the technique.
False positives: the most useful part of your hunt
When the step 2 query returned 28 users and only 3 warranted investigation, the 25 legitimate results are not waste. They are the most detailed false positive analysis you will ever get for this technique. You, the analyst, manually examined each one and determined why it was legitimate. Those 25 determinations become the exclusion list for your detection rule.
For every result you classified as legitimate during analysis, document three things: what the activity was, why it is legitimate, and what exclusion it implies for the rule.
Four categories of false positives appear consistently across hunts. Each has a different exclusion pattern.
Infrastructure FPs. Corporate VPN, proxy, and cloud gateway IPs used by many users. Exclusion: IP range allowlist maintained as a Sentinel watchlist. Review quarterly as infrastructure changes.
Role-based FPs. Users whose job requires the flagged activity. IT administrators legitimately modify directory settings. Finance users legitimately download high volumes during quarter-end. Exclusion: role-based allowlist or threshold adjustment per user group.
Temporal FPs. Activity that is legitimate at certain times: maintenance windows, quarterly reporting spikes, monthly patching cycles. Exclusion: time-based suppression or dynamic thresholds that account for business cycles.
Onboarding FPs. New users, devices, and applications all produce "first seen" signals during their initial period. Everything about a new user is anomalous against a baseline that does not include them. Exclusion: suppress or flag as low-confidence for entities with less than the baseline window of history.
Document each false positive category with the specific exclusion and the impact on alert volume. NE's hunt log for TH-2026-005 recorded: 18 infrastructure FPs (VPN IP rotation, excluded via watchlist), 4 role-based FPs (IT admins accessing new services, excluded via admin role filter), 2 temporal FPs (quarter-end SharePoint downloads, excluded via dynamic threshold), and 1 onboarding FP (new hire, excluded via 30-day account age filter). Total exclusions reduced expected alert volume by 89% without removing any true positive signals.
A rule deployed without false positive analysis might have a 10% true positive rate: 9 out of 10 alerts are VPN changes, travel, or new employees. The SOC tunes it over weeks, adding exclusions one at a time as each false positive is triaged. A rule deployed with hunt-based FP analysis starts at 70 to 90% true positive rate because the exclusions were identified during the hunt. This is why hunt-derived rules are better than rules built from theory alone: the hunt analyst saw the legitimate activity patterns in the actual environment.
Hunt query to detection rule conversion. The core KQL logic is preserved. Time window, thresholds, and exclusions are adapted for unattended execution, informed by the false positive analysis from the hunt.
When the query is too complex for a scheduled rule
Some hunt queries, particularly those using behavioral baselines with make_series or multi-table joins across large time windows, exceed the execution constraints of a scheduled analytics rule. Three options address this.
Simplify the rule, keep the hunt. Deploy a simpler version that catches the most obvious variants of the technique. Continue hunting periodically for the subtle variants the simplified rule misses. The rule handles the base case; hunting handles the edge cases.
Pre-compute baselines in a watchlist. Run a daily Logic App or automation rule that computes the per-user baseline and stores results in a Sentinel watchlist. The detection rule joins against the watchlist instead of computing the baseline on every execution. This separates the expensive computation from the real-time detection.
Deploy as a hunting query rather than an analytics rule. Sentinel's built-in hunting queries run on demand or on a schedule via the Hunts feature. They do not create incidents automatically but flag results for analyst review. This is a middle ground between manual hunting and full automated detection.
For organizations using detection-as-code (storing analytics rules in a Git repository and deploying through CI/CD), the hunt-to-detection conversion integrates naturally. Each converted rule is submitted as a pull request with the hunt ID in the commit message, linking the detection rule permanently to the hunt that produced it. PR descriptions include the hypothesis, false positive analysis, and threshold justification. Any future question about why the rule exists, why the threshold is set at that level, or why specific exclusions are configured can be answered by reading the linked hunt record. Without detection-as-code, maintain this traceability through naming: prefix hunt-derived rules with HUNT- followed by the campaign and sequence number.
The 14-day validation period
Even with hunt-informed tuning, the first 14 days of production execution reveal edge cases the hunt did not encounter. A 30-day hunt examined one window. Detection rules run continuously across changing conditions: new employees joining, VPN infrastructure rotating, seasonal business patterns shifting.
Deploy the rule in report-only mode for the first 14 days. Report-only mode executes the query on schedule and logs results, but does not create incidents. After 14 days, review the logged results: count the alerts, categorize each as true positive or false positive, and calculate the precision rate. If precision exceeds 70%, promote to production. If precision is below 50%, the exclusions need refinement. Add the newly discovered false positive patterns, reset the 14-day clock, and revalidate.
This validation approach is standard practice in detection engineering. Hunting accelerates it because the initial false positive analysis eliminates most of the obvious noise before the first deployment. Without the hunt, the 14-day validation period would surface dozens of false positive patterns the engineer must investigate individually. With the hunt, most of those patterns are already excluded.
The query worked in the hunt, so it should work as a scheduled rule. An analyst copies the 30-day hunt query directly into an analytics rule, sets it to run every hour, and enables it. It attempts to scan 30 days of data every hour, times out on execution, and produces no alerts. Or worse: it completes but generates 200 alerts per day because the thresholds were calibrated for a one-time manual review, not continuous automated execution. Conversion requires adapting the time window, adding hunt-informed exclusions, configuring entity mapping, and setting severity. This takes 30 to 60 minutes. It is the cheapest 30 minutes in the entire hunt cycle because it transforms a one-time finding into permanent automated coverage.
Threat Hunting Principle
A hunt that does not produce a detection rule is a hunt that must be repeated. The Convert step is what makes hunting self-funding: the query you used to find the threat today becomes the analytics rule that detects it automatically tomorrow. False positive analysis during the hunt produces exclusions that make the rule precise from day one. The technique moves from the known-unknown layer to the known-known layer permanently.
Get weekly detection and investigation techniques
KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.
No spam. Unsubscribe anytime. ~2,000 security practitioners.