In this section

Advanced Query Patterns

4-5 hours · Module 1 · Free

What you already know

Section 1.8 taught you the core KQL operators — where, project, extend, summarize, parse_json, mv-expand, let, and the time functions. This section combines those operators into multi-step patterns that answer the questions simple filtering can't: "Which users have abnormal failed sign-in volume?" "Is today's sign-in pattern different from last week?" "Which IP addresses are targeting multiple accounts?" These are the patterns that form the foundation of the detection rules in EI13 and the operational monitoring in EI14.

Scenario

Your daily triage shows 847 failed sign-in events in the last 24 hours. Individually, each looks like a routine typo — wrong password, MFA timeout, expired credential. But are they 847 independent events, or are 600 of them a password spray campaign from 3 IP addresses targeting 200 accounts over a 20-minute window? Simple filtering shows you 847 failures. Multi-step aggregation reveals the pattern hiding inside them.

Multi-step aggregation

The most useful identity security queries combine multiple aggregation steps to answer layered questions. The general pattern: filter raw data, aggregate to answer the first question, then filter or re-aggregate the results to answer the second.

The password spray detection pattern illustrates this. Step one counts failed sign-ins per IP address. Step two filters to IPs whose failure count, distinct user count, and time window match the spray profile — many failures across many accounts in a short burst.

KQL

// Detect password spray patterns — IPs targeting many accounts
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != 0
| summarize
    FailedCount = count(),
    TargetedUsers = dcount(UserPrincipalName),
    FailureCodes = make_set(ResultType, 5),
    FirstFailure = min(TimeGenerated),
    LastFailure = max(TimeGenerated)
    by IPAddress
| where FailedCount > 10 and TargetedUsers > 3
| extend DurationMinutes = datetime_diff('minute', LastFailure, FirstFailure)
| order by TargetedUsers desc
// High FailedCount + many TargetedUsers = spray pattern
// Short DurationMinutes = automated attack tooling
// FailureCodes containing 50126 = invalid password attempts

The same multi-step principle works in reverse for user-centric analysis. Instead of aggregating by IP to find spray sources, aggregate by user to find spray targets:

KQL

// Users with abnormal failed sign-in volume
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != 0
| summarize
    FailedCount = count(),
    DistinctIPs = dcount(IPAddress),
    FailureCodes = make_set(ResultType, 5),
    FirstFailure = min(TimeGenerated),
    LastFailure = max(TimeGenerated)
    by UserPrincipalName
| where FailedCount > 10
| extend DurationMinutes = datetime_diff('minute', LastFailure, FirstFailure)
| order by FailedCount desc
// Many failures + many DistinctIPs = spray target (many IPs attacking one user)
// Many failures + few DistinctIPs = brute force (one IP attacking one user)

The difference between the two queries is only the by clause — IP-centric versus user-centric. The multi-step pattern is the same: aggregate, threshold-filter, enrich with calculated fields. This is the template for every threshold-based detection rule you'll build in EI13.

Time-series analysis and anomaly detection

Time-based patterns reveal attack activity that single-event analysis misses. Password spray campaigns cluster during off-hours. Token replay attacks produce sign-in bursts from new IPs. Configuration changes cause sudden shifts in authentication volume. The hourly volume query with anomaly flagging detects all three.

KQL

// Hourly sign-in volume with anomaly flagging
let baseline = SigninLogs
| where TimeGenerated between (ago(7d) .. ago(1d))
| summarize AvgHourly = count() / (6.0 * 24.0);
SigninLogs
| where TimeGenerated > ago(24h)
| summarize HourlyCount = count() by bin(TimeGenerated, 1h)
| extend AvgHourly = toscalar(baseline)
| extend Deviation = round((HourlyCount - AvgHourly) / AvgHourly * 100, 1)
| extend IsAnomaly = abs(Deviation) > 50
| project TimeGenerated, HourlyCount, AvgHourly, Deviation, IsAnomaly
| order by TimeGenerated asc

This query uses let to define a baseline subquery — the average hourly volume from the previous 6 days. The main query calculates the current hourly volume and compares it to the baseline. The toscalar() function converts the baseline result (a single-row table) into a scalar value that can be used in the extend calculation. A deviation greater than 50% in either direction flags the hour as anomalous. A spike could indicate an attack (spray, credential stuffing) or a legitimate operational event (a training session where 200 users sign in simultaneously). A drop could indicate a service disruption, a log ingestion failure, or a holiday. Both directions warrant investigation. The 50% threshold is a starting point; EI14 teaches you to calibrate it against your tenant's normal variability.

Time-of-day analysis adds another dimension. Grouping sign-ins by hourofday(TimeGenerated) over a week reveals the normal business hours pattern — high volume during 8–18, low overnight. A spike at 3 AM with few distinct users is a strong anomaly signal. This pattern catches attackers who operate during off-hours in the target's time zone, which is common because spray campaigns automated from a different time zone often fire during the attacker's business hours, not the target's.

Day-over-day comparison using union to combine today's hourly pattern against the same day last week detects deviations from the weekly rhythm. Monday always has higher sign-in volume than Saturday. Comparing Monday against Monday eliminates the weekly cycle noise and surfaces genuine anomalies. The comparison works by running two separate queries — one for today, one for last week's matching day — and combining them with union for side-by-side analysis.

String operations for identity data

Identity security queries frequently parse, extract, and compare string values. Three string patterns appear in most operational queries.

Extracting domains from UPNs uses split(): extend UserDomain = tostring(split(UserPrincipalName, "@")[1]). Aggregating by domain reveals which organizations are signing into your tenant — your primary domain, B2B guest domains, and any unexpected external domains that warrant investigation.

IP address range matching uses ipv4_is_in_range(): where ipv4_is_in_range(IPAddress, "203.0.113.0/24"). This is the proper way to match CIDR ranges in KQL — faster and more accurate than string prefix matching. Replace the example range with your organization's actual public IP ranges to distinguish corporate network sign-ins from external ones without relying on Conditional Access named locations.

ResultType code interpretation uses case() to map numeric error codes to readable descriptions. The codes that matter most for security analysis: 50126 (invalid password — the spray indicator), 53003 (blocked by CA — your policies working), 50074 (MFA required — user needs to register), 500121 (failed MFA attempt — either user error or MFA fatigue attack), and 50053 (smart lockout — account locked due to repeated failures). Mapping these codes to descriptions in your queries makes daily triage reports readable without looking up error codes. Build the case() mapping once and reuse it as a let function across your query library — every failure analysis query benefits from human-readable error descriptions.

Figure 1.9 — The five advanced query pattern categories. Each pattern appears in the detection rules (EI13) and operational monitoring (EI14) that you build later in the course.

Finding the latest or earliest event

The arg_max() and arg_min() aggregate functions solve a specific problem that regular summarize can't: within each group, return the row with the highest (or lowest) value of a specified column, along with other columns from that same row. This is different from max(), which returns only the maximum value — arg_max() returns the maximum value plus the associated fields from the row where that maximum occurred.

For identity security, the most common use is finding each user's most recent successful sign-in. The query summarize arg_max(TimeGenerated, IPAddress, AppDisplayName, ConditionalAccessStatus) by UserPrincipalName returns one row per user — the row with the latest TimeGenerated — plus the IP address, application, and CA status from that specific sign-in. Without arg_max, you'd need a self-join against a subquery of max timestamps to achieve the same result, which is slower and harder to read.

arg_min works the same way for the earliest event. Finding IPs that first appeared in the last 7 days — summarize arg_min(TimeGenerated, UserPrincipalName) by IPAddress | where TimeGenerated > ago(7d) — surfaces new IP addresses in your tenant that could be new attacker infrastructure, a new office VPN, or an employee working from a new location. Either way, first-seen IPs warrant review because they represent a change from established patterns.

The arg_max pattern is particularly important for concurrent session detection in Section 1.10. It finds each user's most recent interactive IP address and compares it against their non-interactive activity. A user whose most recent interactive sign-in came from London but whose non-interactive token refresh came from Singapore in the same hour is a strong token theft indicator.

Conditional classification at scale

The case() function inside summarize or extend enables automated security classification across your entire tenant. This is the pattern that turns raw sign-in data into a prioritized triage queue.

Azure Portal

Microsoft Sentinel → Analytics → Create → Scheduled query rule
Every query pattern in this section can become an automated detection rule in EI13. The Sentinel analytics rule wizard accepts a KQL query, a schedule (run every hour), a threshold (alert if results > 0), and entity mapping (which fields identify the user, IP, and device). The query you write here is the same query that runs as a detection rule — the only difference is scheduling and alerting.

KQL

// Classify every successful sign-in by composite security risk
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == 0
| extend
    Country = tostring(LocationDetails.countryOrRegion),
    IsCompliant = tostring(DeviceDetail.isCompliant),
    AuthDetail = parse_json(AuthenticationDetails),
    PrimaryMethod = tostring(parse_json(AuthenticationDetails)[0].authenticationMethod)
| extend SecurityRisk = case(
    RiskLevelDuringSignIn in ("medium", "high")
        and IsCompliant != "true"
        and Country !in ("US", "GB"), "Critical",
    RiskLevelDuringSignIn in ("medium", "high"), "High",
    IsCompliant != "true"
        and Country !in ("US", "GB"), "Medium",
    PrimaryMethod == "Password"
        and AuthenticationRequirement == "singleFactorAuthentication", "Low",
    "Acceptable")
| summarize count() by SecurityRisk
| order by count_ desc

This query combines every signal from Sections 1.4 through 1.7 — risk level, device compliance, country, and authentication method — into a five-tier classification. The case() function evaluates conditions in order, so the most dangerous combination (risk + unmanaged device + unexpected country) gets classified as Critical before less severe combinations are checked. The output is a single table showing how many sign-ins fall into each risk tier — the daily triage summary that tells you where to focus your investigation time.

What we see in 90% of environments

Basic log filtering treated as security monitoring. "We filter sign-in logs by Status: Failure and review the results." Filtering for failures shows 847 individual events. Only multi-step aggregation reveals that 600 of them are a password spray from 3 IPs targeting 200 accounts in 20 minutes. A successful attack produces successful sign-in events that look identical to legitimate activity — only anomaly detection (comparing sign-in properties against the user's baseline) reveals the compromise. Simple filtering finds events. Aggregation finds patterns. Patterns are what security operations requires.

Identity Security Principle

Every detection rule is a query pattern with a threshold. The spray detection query becomes a Sentinel analytics rule when you set the threshold, schedule it to run every hour, and configure the alert action. The anomaly detection query becomes a detection when you define "anomalous" as a percentage deviation from your tenant's baseline. The query patterns in this section are the detection rules of EI13, pre-assembled. The only difference is scheduling and alerting.

Section 1.10 takes the aggregation patterns from this section and applies them across table boundaries. You'll learn the join patterns that correlate SigninLogs with AuditLogs, IdentityInfo, and service principal logs to detect multi-stage attacks — an anomalous sign-in followed by a role assignment, consent grant, or inbox rule creation.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →