In this section

Scoping the Hunt

4-5 hours · Module 1 · Free

What you already know

Section 1.1 taught you to formulate hypotheses that are specific, testable, grounded, and actionable. You have a hypothesis and a scored backlog. This section teaches the boundaries that make a hypothesis executable: what you search, where, when, and for whom.

Scenario

Tom Ashworth starts a hunt for OAuth consent phishing. He opens Advanced Hunting, writes a query against CloudAppEvents filtered to consent operations, and runs it across 90 days for the full tenant. The result set returns 14,000 rows. Every application consent ever granted. Tom scans the first 200 rows, sees names he does not recognize, investigates three of them manually, determines they are legitimate, and runs out of time. The hypothesis is technically untested because most of the data was never analyzed. Tom marks the hunt as "inconclusive" and moves on.

Scope before you query

The hypothesis tells you what to look for. Scope tells you where. Skip this step and one of two things happens: you query everything, retrieve hundreds of thousands of rows, and cannot distinguish signal from noise. Or you query too narrowly, retrieve zero rows, and conclude the threat is absent when it was simply outside your filter.

Scope has four dimensions. Define all four before writing the first KQL statement. Undefined scope produces undefined results.

Four scoping dimensions. Define all four before the first query runs. Scope discipline prevents analysis paralysis (too broad) and false negatives (too narrow).

Dimension 1: Data sources

Which tables does your hypothesis require? The hypothesis itself determines this. "Authentication anomalies" means SigninLogs and AADNonInteractiveUserSignInLogs. "Inbox rule manipulation" means CloudAppEvents filtered to Exchange operations. "Endpoint persistence" means DeviceProcessEvents, DeviceRegistryEvents, and DeviceFileEvents.

Do not add tables speculatively. More tables does not mean better hunting. Each additional table adds query complexity, increases execution time (Advanced Hunting has a 10-minute timeout per query), and introduces noise from legitimate activity. Start with the minimum tables required to test the hypothesis. Add tables only when initial results indicate the investigation should expand.

Run a population check against every table before building the hunt query. This takes 15 seconds per table and prevents wasting an hour building a query against an empty table:

KQL

// Data source validation — run per table before building hunt queries
CloudAppEvents
| where TimeGenerated > ago(30d)
| summarize
    EventCount = count(),
    FirstEvent = min(TimeGenerated),
    LastEvent = max(TimeGenerated),
    DistinctApps = dcount(Application),
    DistinctUsers = dcount(AccountId)
// EventCount = 0: table not ingested. Stop here.
// FirstEvent recent but not 30d ago: retention shorter than scope.
// DistinctUsers low: connector may be partially configured.

If DistinctUsers is significantly lower than your tenant's user count, the connector may be partially configured. Investigate before proceeding. A hunt that examines 30% of your population without knowing it creates a false sense of coverage.

Dimension 2: Time window

How far back you search depends on the hypothesis and the technique's expected dwell time.

Choosing the wrong window is one of the most common scoping mistakes. A 365-day window sounds thorough but produces millions of rows that overwhelm both the query engine and the analyst. Advanced Hunting enforces a 10-minute execution timeout per query and limits result sets to 10,000 rows in the interactive editor. A 365-day query against SigninLogs for a 500-user tenant generates roughly 15 million records before filtering. The query times out before it completes, and the analyst concludes the data is too noisy to hunt. The data is not noisy. The window is wrong.

Standard hunt window: 30 days. Covers most active compromises. If an attacker has been present longer than 30 days, the technique has likely produced other indicators within this window, or the attacker is sophisticated enough to require a dedicated long-dwell investigation.

Short hunt window: 1 to 7 days. Appropriate when fresh threat intelligence provides IOCs with a limited relevance window, or when the technique produces high-volume data where 30 days would overwhelm the query engine.

Extended hunt window: 90+ days. Appropriate for long-dwell hypotheses targeting APT persistence, supply chain compromise, or low-volume slow techniques (one OAuth consent per week over three months). Requires Sentinel's long-term retention or search jobs for archived data if standard retention has been exceeded.

The baseline window. Several campaign modules use behavioral baselining: comparing recent activity against a historical norm. These hunts need two separate windows. A baseline window (typically 30 to 90 days of historical data establishing "normal") and a detection window (typically 1 to 7 days of recent data to find "abnormal"). The baseline window must predate the detection window. You cannot build a baseline from data that might already contain the attack.

KQL

// Dual-window pattern: 30-day baseline, 7-day detection
let baselineStart = ago(37d);
let baselineEnd = ago(7d);
let detectionStart = ago(7d);
let detectionEnd = now();
// Step 1: Build baseline of known-good IPs per user
let baseline = SigninLogs
| where TimeGenerated between (baselineStart .. baselineEnd)
| summarize NormalIPs = make_set(IPAddress, 20) by UserPrincipalName;
// Step 2: Find sign-ins from IPs not in baseline
SigninLogs
| where TimeGenerated between (detectionStart .. detectionEnd)
| join kind=inner baseline on UserPrincipalName
| where not(IPAddress in (NormalIPs))
| project TimeGenerated, UserPrincipalName, IPAddress,
    AppDisplayName, DeviceDetail, LocationDetails

The gap between baseline end and detection start matters. If baseline ends at ago(7d) and detection starts at ago(7d), the windows are contiguous. If the attacker compromised an account 8 days ago, the last day of baseline is contaminated. In high-confidence hunts, introduce a 2-day gap between baseline and detection windows: baseline ends at ago(9d), detection starts at ago(7d). The gap prevents contamination at the cost of two days of coverage.

For NE's AiTM hypothesis, Tom Ashworth uses a 37-day baseline ending at ago(9d) and a 7-day detection window starting at ago(7d). The 2-day gap means that if an attacker compromised an account 8 days ago and signed in from their infrastructure during the gap period, that infrastructure does not appear in the baseline, and the detection window still catches any sign-in from that infrastructure in the last 7 days. Without the gap, the attacker's IP would have been included in the "normal" baseline set and filtered out of the results.

Sentinel search jobs extend hunting beyond the interactive retention window. For hypotheses requiring 180-day or 365-day lookbacks, a search job runs asynchronously against archived data and persists the results into a new table you can query interactively. Campaign module TH16 covers search jobs in depth. For most campaigns in this course, the standard 30-day interactive window is sufficient.

Dimension 3: Population

Who or what are you searching? Full tenant, a department, privileged accounts, a specific device group?

Full tenant hunts are appropriate for technique discovery. "Does this technique occur anywhere in our environment?" produces comprehensive results but generates more noise. Most campaign modules start at full tenant scope.

Targeted population hunts are appropriate when the hypothesis applies to a specific group. Email-based threats (TH7) focus on accounts with administrative roles or PIM-eligible assignments, since those accounts have the access worth compromising. Insider threat campaigns (TH13) focus on HR-flagged individuals. Data exfiltration hunts (TH8) focus on users with access to sensitive SharePoint sites or document libraries containing financial data.

Device-scoped hunts apply to endpoint campaigns (TH9, TH10, TH12). Servers and workstations behave differently. A scheduled task created on a domain controller is more likely legitimate than one created on a marketing workstation. Scoping by device type or group reduces false positives without reducing detection coverage for the technique you are testing.

At NE, Priya Sharma's first OAuth consent hunt examines the full tenant. She finds 47 consent grants in 30 days. On the second iteration, she narrows to consent grants where the consenting user has an administrative role (Global Admin, Exchange Admin, SharePoint Admin, Application Admin). The result set drops to 3 events. Each represents a higher-risk consent because the consenting identity has elevated privileges. The remaining 44 grants from standard users are lower priority but not zero-risk: any of them could be an attacker-controlled application. Priya documents that she examined privileged-user consents fully and standard-user consents at summary level.

Narrowing the population reduces noise but increases the risk of missing a threat outside the scoped group. Start broad for the first iteration. If results are too noisy, narrow on subsequent passes. Document why you narrowed: the excluded population represents a known blind spot in this specific hunt, and someone reviewing your hunt record needs to understand what was not examined.

Dimension 4: Success criteria

Without success criteria, hunts drift. The analyst keeps querying because there might be something else to find, long after the hypothesis has been adequately tested. Success criteria answer two questions before the first query runs.

When have you found something? Define a positive finding before you start. For an identity compromise hunt: a positive finding is a sign-in from a device and location combination not in the user's 30-day baseline, for a non-service account, that cannot be explained by known VPN configurations or documented travel. If you find this, escalate to IR or investigate further.

When have you found nothing? Define adequate coverage. For the same hunt: if you examined all active user accounts over a 30-day detection window against a 30-day baseline, and zero accounts show unexplained new-device plus new-location combinations, the hypothesis is refuted for this time window. Document the negative finding and close the campaign. You do not need to repeat this hunt manually next month: the query you built becomes a detection rule that monitors continuously going forward.

Analyst Decision

Scope definition for Tom's OAuth consent hunt:

Data sources: AuditLogs (consent grant events), AADServicePrincipalSignInLogs (post-consent app behavior). Both confirmed ingested.

Time window: Detection: 30 days. No baseline window needed (consent grants are discrete events, not behavioral patterns).

Population: Full tenant. All consent grants, regardless of user role. First iteration only.

Positive finding: OAuth application with Mail.ReadWrite, Files.ReadWrite.All, or Directory.ReadWrite.All permissions, consented in the last 30 days, from a publisher not in NE's approved application list.

Negative finding: All consent grants in the 30-day window map to approved publishers with expected permission scopes. Document and convert the consent-monitoring query into an analytics rule.

The distinction between these two criteria is essential. A positive finding triggers escalation. A negative finding triggers documentation and rule conversion. Without the negative-finding criterion, hunts never close. The analyst keeps re-running the same query, worried they might have missed something, and the campaign stays "in progress" indefinitely. Define what "done" looks like before you start.

Compare Tom's original approach (90 days, full CloudAppEvents, all 14,000 rows, no success criteria) to this scoped version. He now queries 30 days of AuditLogs filtered to consent events, uses an approved-publisher allowlist as his baseline, and knows exactly what constitutes a finding. The result set drops from 14,000 rows to perhaps 15. Each one gets his full attention.

The two scope traps

Trap 1: Too broad. The hypothesis targets OAuth abuse. Mid-hunt, the analyst notices suspicious inbox rules in the data and pivots to authentication analysis. Three hours later: partial results in three domains, completed analysis in none, documentation for nothing. Scope discipline means testing the hypothesis you started with. If you discover a new lead during the hunt, log it as a separate hypothesis in the backlog. Do not pivot mid-campaign.

Trap 2: Too narrow. The hypothesis specifies "inbox rules created via PowerShell." The analyst queries for New-InboxRule operations where the client is PowerShell and gets zero results. Conclusion: no malicious inbox rules exist. But the attacker used the Graph API, not PowerShell, and the query did not cover that path. The correct scope for "inbox rule manipulation" includes all creation methods: Outlook, OWA, PowerShell, EWS, and Graph API. Narrow the technique, not the detection surface.

The distinction matters. Narrowing the technique means "I am only hunting for inbox rule manipulation, not email forwarding." That is a valid scope decision. Narrowing the detection surface means "I am only looking for inbox rules created by one specific client." That filters out the attacker's preferred method and produces a false negative.

Expanding scope when results look interesting

The analyst begins an OAuth consent hunt, finds a suspicious application, and follows the thread into the application's sign-in history, then into the user's mailbox activity, then into SharePoint access logs. Four hours later, the analyst has deep knowledge of one compromised user but has never returned to the original scope. The other 14 consent grants were never examined. One of them was a second attacker-controlled application with broader permissions than the first. Scope creep during a hunt is not thoroughness. It is abandoning the systematic process that makes hunting reliable. When you find something during a hunt, bookmark it, escalate if needed, and complete the original scope before following the new thread.

Threat Hunting Principle

Define four dimensions before the first query runs: data sources, time window, population, and success criteria. The scope prevents two failure modes with equal severity. Querying too broadly produces results you cannot analyze in the time available. Querying too narrowly produces a false negative that leaves the attacker's preferred method unchecked.

Section 1.3 teaches collection: the iterative querying process that tests your hypothesis within the scope you defined. You will learn the broad-to-narrow pattern, how to handle large result sets without losing signal, and when to expand versus restrict your queries based on what the data returns.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →