In this section

Collection: Iterative Querying

4-5 hours · Module 1 · Free

What you already know

Section 1.1 taught you to formulate testable hypotheses. Section 1.2 taught you to scope hunts across four dimensions before the first query runs. You have a hypothesis, a scored priority, and defined boundaries. This section teaches the querying process that tests the hypothesis within those boundaries.

Scenario

Priya Sharma runs her first hunt query against 7 days of SigninLogs for the full NE tenant. The query returns 347,000 rows. She scrolls through the first 50, sees nothing obviously malicious, and tells Tom the environment looks clean. She tested nothing. She surveyed 0.01% of the data and declared the hypothesis refuted. The problem is not Priya's analysis skills. The problem is that she ran a single query instead of an iterative chain that narrows 347,000 events to the handful worth investigating.

Hunting is iterative

A detection rule runs one query once and fires if conditions match. A hunt is five to fifteen queries, each informed by the results of the previous one, converging on a finding or confirming absence. The pattern is consistent across every campaign in this course.

Step 1: Orientation. Broad. Establishes volume, distribution, and shape of the data in your scope. "How many sign-in events per day exist in this window? How many unique users? What is the distribution by country?" This query does not find threats. It gives you the context to interpret everything that follows. If you do not know your environment produces 50,000 sign-ins per day, you cannot judge whether 200 anomalous results is a significant finding or expected noise.

Step 2: Indicator queries. Targeted. Each query tests one aspect of the hypothesis. "How many users have sign-ins from IPs not in their 30-day baseline?" "Of those, how many are from countries not previously seen?" "Of those, how many occurred within 24 hours of a new MFA method registration?" Each query narrows the result set. The funnel takes tens of thousands of events to hundreds, then to dozens, then to the handful that warrant investigation.

Step 3: Enrichment queries. Contextual. For each suspicious result from Step 2, add the context needed to make a judgment. "What applications did this user access from the anomalous IP?" "Were any inbox rules created in the same time window?" "What was the user's normal activity pattern in the week before?" Enrichment does not find threats. It provides the information you need to decide whether an anomaly is an attack or a business trip.

Step 4: Pivot queries. Expansive. If Step 3 reveals a probable compromise, pivot to related data. "The user accessed SharePoint from the anomalous IP. What files were downloaded?" "The session came from an IP in Romania. Are there other users with sessions from the same IP?" "An inbox rule was created. Are there similar rules on other mailboxes?" The pivot expands from one indicator to the full scope of the compromise.

The iterative query funnel. Four queries reduce 347,000 sign-in events to 1 confirmed compromise, with documented negative findings at each narrowing step.

The funnel in practice

Here is the first two steps of an identity compromise hunt, showing how each query builds on the previous result:

KQL

// Step 1: Orientation — how much sign-in data exists?
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| summarize
    TotalSignIns = count(),
    UniqueUsers = dcount(UserPrincipalName),
    UniqueIPs = dcount(IPAddress),
    UniqueCountries = dcount(tostring(LocationDetails.countryOrRegion))
// NE result: 347K sign-ins, 810 users, 94 IPs, 4 countries
// This tells you: any user with a 5th country is anomalous

Priya's orientation result shows that NE's 810 users authenticate from 4 countries and 94 unique IP addresses in a normal week. Any sign-in from a 5th country or from an IP outside the known 94 is worth examining. Without this baseline context, she has no reference point for what "anomalous" means in her specific environment.

This is the step most analysts skip. They go directly to the indicator query because orientation feels like a waste of time. But orientation queries are fast (5 seconds on a typical tenant) and they calibrate every judgment you make afterward. An environment with 4 countries and 94 IPs has a very different anomaly threshold than one with 27 countries and 3,000 IPs. The orientation query tells you which environment you are working in.

KQL

// Step 2: Indicator — which users signed in from new IPs?
let baseline = SigninLogs
| where TimeGenerated between (ago(37d) .. ago(7d))
| where ResultType == 0
| summarize KnownIPs = make_set(IPAddress, 50) by UserPrincipalName;
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| join kind=inner baseline on UserPrincipalName
| where not(IPAddress in (KnownIPs))
| summarize
    NewIPCount = dcount(IPAddress),
    Countries = make_set(tostring(LocationDetails.countryOrRegion), 10),
    FirstSeen = min(TimeGenerated)
    by UserPrincipalName
| sort by NewIPCount desc
// Result: 28 users with IPs not seen in 30-day baseline
// Next step: enrich with MFA registration + inbox rule events

From 347,000 events to 28 users. Each of those 28 users signed in from at least one IP address that did not appear in their 30-day baseline. Most will have legitimate explanations: business travel, a new home ISP, a new mobile carrier. The enrichment step (Step 3) adds the correlated signals that separate travel from compromise.

Multi-table correlation: finding attack chains

Single-table hunting finds indicators. Multi-table correlation finds attack chains. An anomalous sign-in in SigninLogs is an indicator. That sign-in followed by an inbox rule creation in CloudAppEvents followed by an OAuth consent in AuditLogs is an attack chain. The difference is the join.

Three multi-table patterns appear repeatedly across campaign modules. Learn them here. Apply them in TH4 through TH13.

Pattern 1: Authentication + directory change. Did the user who signed in anomalously also make directory changes? This pattern catches AiTM compromises that pivot to MFA registration, role assignment, or Conditional Access policy modification. The join key is UserPrincipalName and a time window (typically 24 hours after the anomalous sign-in).

Sign-in Record

TimeGenerated:    2026-05-20T14:32:18Z
UserPrincipalName: j.martinez@northgate-eng.com
IPAddress:         185.220.101.42
Location:          Romania
ResultType:        0 (Success)
AppDisplayName:    Microsoft Office 365
DeviceDetail:      {"browser":"Chrome 126","operatingSystem":"Windows 10"}
RiskLevelDuringSignIn: medium
AuthenticationRequirement: singleFactorAuthentication
-- 90 minutes later --
AuditLogs Operation: "User registered security info"
AuditLogs Target:    j.martinez@northgate-eng.com
AuditLogs Detail:    "User registered new authentication phone method"

This is what the evidence looks like before you write the join query. A medium-risk sign-in from Romania on a device with no Entra registration, followed 90 minutes later by a new MFA method registration. Each record in isolation might not trigger an alert. The correlation makes it a finding.

Pattern 2: Authentication + cloud application activity. What did the user do after signing in? Sign-in logs tell you who authenticated. CloudAppEvents tells you what they did afterward. The join connects authentication anomalies to post-compromise actions: inbox rule creation, email forwarding, SharePoint downloads, OneDrive sync operations. This is the most common enrichment pattern because it bridges identity telemetry (who authenticated) with behavioral telemetry (what they did).

For the 28 users identified by the indicator query, the enrichment join checks whether any of them created inbox rules, consented to OAuth applications, or accessed SharePoint from the anomalous IP within a 24-hour window. Three users show correlated signals: sign-in from a new IP, followed by inbox rule creation within 2 hours. The remaining 25 users show sign-ins from new IPs with no follow-on suspicious activity, consistent with legitimate travel or ISP changes.

Pattern 3: Cloud activity + endpoint activity. For environments with Defender for Endpoint, this pattern bridges the cloud-to-endpoint gap. A user signs in from an anomalous location (SigninLogs), accesses SharePoint (CloudAppEvents), and downloads files to an unmanaged endpoint (DeviceFileEvents). Each table covers a different plane of the attack. Without the join, each plane looks normal in isolation.

For cloud-to-endpoint correlation, the join key is typically the user principal name and a time window. DeviceFileEvents records file creation and modification on managed endpoints. If a user signs in from Romania and downloads files, but no DeviceFileEvents appear on any managed endpoint, the download went to an unmanaged device. That absence is itself a finding: data left the tenant and landed on hardware outside your control.

Campaign modules TH8 (data exfiltration) and TH12 (pre-ransomware activity) use this pattern extensively. TH8 correlates SharePoint access with endpoint file events to detect large-volume downloads. TH12 correlates cloud activity with process creation events (DeviceProcessEvents) to detect staging tools on endpoints.

Query documentation discipline

Every query in the chain must be documented in the hunt record. Not just the query that found the threat. Every query, including the ones that produced no results.

Documenting null-result queries proves what you examined. Without them, the hunt record shows only what you found, not what you looked for. A reviewer cannot tell whether you tested the hypothesis comprehensively or stopped after one query.

Hunting Hypothesis

Query chain documentation standard:

Query 1 (Orientation): SigninLogs, 7d, ResultType 0. Result: 347K events, 810 users, 94 IPs, 4 countries. Interpretation: establishes baseline distribution.

Query 2 (Indicator): SigninLogs 7d vs 30d baseline, new IPs. Result: 28 users with new IPs. Interpretation: narrowing candidates for enrichment.

Query 3 (Enrichment): AuditLogs join on 28 users, 24h post-signin. Result: 3 users with MFA registration + inbox rule creation. Interpretation: correlated signals indicate probable compromise in 3 accounts.

Query 4 (Pivot): CloudAppEvents + DeviceFileEvents for 3 confirmed users. Result: j.martinez SharePoint downloads from Romanian IP. 1.2GB exfiltrated. Escalation triggered.

Each entry follows a consistent format: the query name (numbered), the table and time window, the result count, and the analyst's interpretation of what the result means for the hypothesis. The interpretation is critical. A query returning 28 results is a fact. "28 users with new IPs, consistent with a mix of travel and potential compromise, requiring enrichment" is an analytical judgment. The hunt record must contain both.

This documentation format records what you asked, what the data returned, and what you concluded at each step. When the hunt converts to a detection rule in Section 1.6, the query chain provides the exact logic the rule needs. When a peer reviews the hunt in the quality assurance process (Section 1.9), they can reproduce every step.

Managing Advanced Hunting limits

Advanced Hunting enforces resource limits that affect how you structure queries. Understanding these constraints prevents mid-hunt failures.

Advanced Hunting enforces a 10-minute execution timeout per query and returns a maximum of 10,000 rows. Complex joins against large tables frequently hit one or both limits. Two techniques manage this. First, use let statements to pre-filter tables before joining. A join between two unfiltered million-row tables times out. A join between a 500-row filtered set and a million-row table completes in seconds. Second, use materialize() for intermediate result sets you reference multiple times. Materialization evaluates the expression once and caches the result, avoiding redundant table scans.

For hunts requiring more than 10,000 rows (rare but possible in large environments), use Sentinel's KQL jobs. A KQL job runs asynchronously with extended resource limits and persists results into a table you can query interactively. Campaign module TH16 covers KQL jobs in detail. Pre-filtering with let is the single most important performance technique for hunt queries. Consider the authentication + directory change pattern. Without let, you join SigninLogs (millions of rows) with AuditLogs (millions of rows). The query engine must scan both tables fully before applying the join condition, and it times out. With let, you first filter SigninLogs to the 28 users with new IPs (a few hundred rows), store that as a variable, then join AuditLogs against only those 28 users. The join completes in seconds because one side of the join is small.

Materialization with materialize() extends this pattern. When you reference the same intermediate result set multiple times (once to check directory changes, once to check inbox rules, once to check OAuth consents), materialize() evaluates the expression once and caches it. Without materialization, each reference re-scans the underlying table. For multi-step enrichment queries where the same user population feeds three or four downstream checks, materialization can reduce total execution time by 60 to 80 percent.

For most hunts, the interactive editor with pre-filtering and materialization is sufficient.

Running the same query with minor tweaks instead of building a chain

The analyst writes a complex query that checks for new IPs, new countries, new devices, MFA changes, and inbox rules in a single 40-line KQL statement. The query times out. The analyst simplifies by removing conditions until the query runs, but the remaining logic no longer tests the full hypothesis. Iterative collection means each query tests one thing. The orientation query establishes context. Each indicator query tests one aspect of the hypothesis. The enrichment query adds context to suspicious results. Breaking the hypothesis into sequential queries is not slower. It is the only approach that works within Advanced Hunting's resource limits while maintaining analytical rigor.

Threat Hunting Principle

Collection is a conversation with the data, not a single question. Each query builds on the results of the previous one, narrowing from orientation through indicators to enrichment and pivot. Document every query in the chain, including those that returned no results. The null-result queries prove what you tested. The finding-result queries prove what you found.

Section 1.4 teaches analysis: how to separate signal from noise in the results your collection queries produce. You will learn contextual enrichment, behavioral baselining methodology, and the judgment framework that distinguishes legitimate anomalies from attacker activity.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →