In this section

6.1 mv-expand and mv-apply. Working with Dynamic Arrays

4-5 hours · Module 6
What you already know

Module 5 covered string parsing and data extraction. This module covers advanced filtering and pattern matching, techniques for complex conditions, dynamic lists, set operations, and pattern-based detection.

WATCHLIST-DRIVEN LOGIC VIP List Safe IPs High Risk Query Lookup Decision Action

Figure 6.1. Watchlists provide dynamic configuration for automation decisions.

Module 5 introduced mv-expand for unpacking arrays. This subsection goes deeper, covering the nuances, performance implications, and advanced patterns that make mv-expand and mv-apply essential for security analysis.

mv-expand, one row per array element

mv-expand creates one output row for every element in a dynamic array column. A single sign-in event with 5 conditional access policy evaluations becomes 5 rows: each containing all the original sign-in columns plus one policy evaluation:

SigninLogs
| where TimeGenerated > ago(1h)
| where ResultType == "0"
| mv-expand CAPolicy = ConditionalAccessPolicies
| extend PolicyName = tostring(CAPolicy.displayName)
| extend PolicyResult = tostring(CAPolicy.result)
| extend EnforcedControls = tostring(CAPolicy.enforcedGrantControls)
| project TimeGenerated, UserPrincipalName, PolicyName, PolicyResult, EnforcedControls

Row multiplication warning: mv-expand multiplies your row count. If your input has 10,000 sign-ins and each sign-in has 8 CA policy evaluations, the output has 80,000 rows. Always filter before mv-expand and aggregate after it to keep results manageable.

Typed mv-expand for performance

Specifying the output type in mv-expand avoids dynamic type resolution on every row:

// Untyped — KQL infers type per row (slower)
| mv-expand Anomalies

// Typed — KQL knows the type upfront (faster)
| mv-expand Anomalies to typeof(int), TimeGenerated to typeof(datetime), Score to typeof(real)

Always use typed mv-expand when unpacking time-series arrays (from make_series or series_decompose_anomalies). The performance difference is significant on arrays with hundreds of elements.

mv-apply, subquery on each array element

mv-apply is the power tool. It applies a full subquery (filter, extend, summarize) to each array element and returns the results:

SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == "0"
| mv-apply CAPolicy = ConditionalAccessPolicies on (
    where tostring(CAPolicy.result) == "success"
    | where tostring(CAPolicy.enforcedGrantControls) has "Mfa"
    | summarize MFAPolicies = make_set(tostring(CAPolicy.displayName))
)
| where array_length(MFAPolicies) > 0
| project TimeGenerated, UserPrincipalName, MFAPolicies

This finds sign-ins where at least one conditional access policy successfully enforced MFA, and collects the names of those policies. The mv-apply subquery filters to successful MFA policies, summarizes them, and the outer query checks whether any were found.

Why mv-apply instead of mv-expand + summarize?

mv-expand creates intermediate rows that must then be re-aggregated. mv-apply does both operations inside the expansion, fewer intermediate rows, less memory, faster execution.

// mv-expand approach: creates 80K intermediate rows, then summarizes back to 10K
SigninLogs | mv-expand ... | where ... | summarize ... by UserPrincipalName

// mv-apply approach: applies filter and summarize inside, output is directly 10K rows
SigninLogs | mv-apply ... on (where ... | summarize ...)

Pattern: finding the first/last/max element in an array

// Find the CA policy with the highest precedence (first successful one)
SigninLogs
| where TimeGenerated > ago(1h)
| mv-apply CAPolicy = ConditionalAccessPolicies on (
    where tostring(CAPolicy.result) == "success"
    | top 1 by tostring(CAPolicy.displayName) asc
    | project FirstPolicy = tostring(CAPolicy.displayName)
)

// Find the authentication step with the longest duration
SigninLogs
| where TimeGenerated > ago(1h)
| mv-apply AuthStep = AuthenticationDetails on (
    extend Duration = toint(AuthStep.authenticationStepDuration)
    | top 1 by Duration desc
    | project SlowestStep = tostring(AuthStep.authenticationMethod), Duration
)

Pattern: cross-array correlation

When two arrays in the same row need to be correlated (e.g., matching index positions), use the with_itemindex option:

| make-series Count = count() default=0 on TimeGenerated from ago(7d) to now() step 1h
| mv-expand with_itemindex=idx TimeGenerated to typeof(datetime), Count to typeof(long)

The idx column provides the array index for each expanded row. This enables operations like "find the index of the maximum value in one array and extract the corresponding value from another array at the same index."

Pattern: entity extraction from nested arrays

SecurityAlert entities are stored as a JSON array of mixed entity types:

SecurityAlert
| where TimeGenerated > ago(24h)
| mv-expand Entity = parse_json(Entities)
| extend EntityType = tostring(Entity.Type)
| extend EntityValue = case(
    EntityType == "account", tostring(Entity.Name),
    EntityType == "ip", tostring(Entity.Address),
    EntityType == "host", tostring(Entity.HostName),
    EntityType == "file", tostring(Entity.Name),
    tostring(Entity)
)
| where isnotempty(EntityValue)
| project TimeGenerated, AlertName, EntityType, EntityValue

This unpacks the mixed-type entity array into a flat table of entity type and value, enabling you to answer "which IPs appeared in alerts this week?" or "which accounts are referenced in the most alerts?"

Performance comparison: mv-expand vs mv-apply

On a dataset of 50,000 sign-ins with an average of 6 CA policies each (300,000 expanded rows), the performance difference is measurable:

// mv-expand approach: ~4.5 seconds
SigninLogs | mv-expand CAPolicy = ConditionalAccessPolicies
| where tostring(CAPolicy.result) == "success"
| extend PolicyName = tostring(CAPolicy.displayName)
| summarize Policies = make_set(PolicyName) by UserPrincipalName, TimeGenerated

// mv-apply approach: ~2.1 seconds (same result)
SigninLogs | mv-apply CAPolicy = ConditionalAccessPolicies on (
    where tostring(CAPolicy.result) == "success"
    | summarize Policies = make_set(tostring(CAPolicy.displayName))
)

The mv-apply version is faster because it filters and aggregates within the expansion scope, fewer intermediate rows are created and passed through the pipeline. For detection rules that run every 15 minutes, a 2-second difference per execution saves 192 seconds per day. On complex rules with multiple mv-expand stages, the cumulative savings are significant.

Pattern: conditional access gap analysis

SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == "0"
| mv-apply CAPolicy = ConditionalAccessPolicies on (
    extend Result = tostring(CAPolicy.result)
    | extend PolicyName = tostring(CAPolicy.displayName)
    | summarize 
        SuccessCount = countif(Result == "success"),
        FailureCount = countif(Result == "failure"),
        NotAppliedCount = countif(Result == "notApplied"),
        AllPolicies = make_set(PolicyName)
)
| where SuccessCount == 0 and FailureCount == 0
// Sign-ins where ALL policies were "notApplied" — no CA protection at all
| summarize UnprotectedSignIns = count(), 
    SampleUsers = make_set(UserPrincipalName, 10),
    SampleApps = make_set(AppDisplayName, 10)

Sign-ins where every conditional access policy evaluated to "notApplied" have no CA protection. These are gaps in your CA policy coverage, applications, user groups, or conditions that no policy covers. This query is essential for security posture assessment: the number of unprotected sign-ins is a direct measure of CA coverage gaps.

Multi-array correlation with mv-expand

When you need to correlate elements from two arrays in the same row, expand them with with_itemindex:

// Correlate time-series anomaly flags with their timestamps
| make-series Count = count() default=0 on TimeGenerated from ago(7d) to now() step 1h
| extend (Anomalies, Score, Expected) = series_decompose_anomalies(Count, 3.0)
| mv-expand with_itemindex=idx 
    TimeGenerated to typeof(datetime),
    Count to typeof(long),
    Anomalies to typeof(int),
    Score to typeof(real)
| where Anomalies != 0

The with_itemindex=idx adds an index column that ensures all arrays expand in lockstep, element 0 from each array on the same row, element 1 on the next row, etc. Without it, the arrays might not align correctly.

Pattern: authentication method inventory

Understanding which authentication methods are used across your tenant is a security posture assessment:

SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == "0"
| mv-apply AuthStep = AuthenticationDetails on (
    extend Method = tostring(AuthStep.authenticationMethod)
    | extend Succeeded = tobool(AuthStep.succeeded)
    | where Succeeded == true
    | summarize Methods = make_set(Method)
)
| mv-expand Method = Methods to typeof(string)
| summarize UserCount = dcount(UserPrincipalName), SignInCount = count() by Method
| sort by UserCount desc

This shows how many users are authenticating with each method: FIDO2, Microsoft Authenticator push, SMS, phone call, password-only. A high count of "password" authentications without a second factor indicates MFA gaps. A high count of "SMS" indicates users on a weaker MFA method that is vulnerable to SIM swapping.

Pattern: nested array search across all records

Find any sign-in where any conditional access policy name contains a specific keyword:

SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == "0"
| mv-apply CAPolicy = ConditionalAccessPolicies on (
    where tostring(CAPolicy.displayName) has "Legacy"
    | take 1
    | project MatchedPolicy = tostring(CAPolicy.displayName)
)
| where isnotempty(MatchedPolicy)
| summarize count() by MatchedPolicy, UserPrincipalName

This searches across millions of sign-ins to find ones where a CA policy with "Legacy" in the name was evaluated. The take 1 inside mv-apply stops after the first match per row, we only need to know whether a match exists, not enumerate all matches.

mv-expand for time-series unpacking (cross-reference with Module 7)

The primary use of mv-expand outside of JSON arrays is unpacking time-series arrays produced by make_series:

SigninLogs
| where TimeGenerated > ago(7d)
| make-series Count = count() default=0 on TimeGenerated from ago(7d) to now() step 1h
    by UserPrincipalName
| extend (Anomalies, Score, Expected) = series_decompose_anomalies(Count, 3.0)
| mv-expand with_itemindex=idx
    TimeGenerated to typeof(datetime),
    Count to typeof(long),
    Anomalies to typeof(int),
    Score to typeof(real),
    Expected to typeof(real)
| where Anomalies != 0
| where TimeGenerated > ago(4h)

Key points:

  • Always use typed expansion for time-series arrays: the types are known (datetime, long, int, real) and specifying them avoids per-row type inference overhead
  • with_itemindex ensures all arrays expand in lockstep, element 42 from each array appears on the same output row
  • Filter after expansion to keep only anomalous points from recent hours: the 7-day lookback is for the baseline, not for the alert output
  • The expanded output returns to "one row per time bin" format that can be processed with standard operators (where, project, join)

This pattern appears in every Module 7 detection rule. Understanding mv-expand here is a prerequisite for time-series anomaly detection.

Handling empty arrays

When an array column is empty (zero elements) or null, mv-expand drops the row entirely, it produces no output for that row. This can silently remove data:

// If a sign-in has zero CA policies evaluated, mv-expand drops it
SigninLogs
| mv-expand CAPolicy = ConditionalAccessPolicies
// Sign-ins with empty ConditionalAccessPolicies array are GONE

// To preserve them, use a default value approach:
SigninLogs
| extend HasCA = array_length(ConditionalAccessPolicies) > 0
| extend CAForExpansion = iff(HasCA, ConditionalAccessPolicies, dynamic([{"displayName":"None","result":"none"}]))
| mv-expand CAPolicy = CAForExpansion

The workaround replaces empty arrays with a synthetic element before expansion. This ensures every sign-in appears in the output, those with no CA policies show "None" instead of being silently dropped.

Use mv-apply on the AuthenticationDetails array to find sign-ins where the first authentication method was "password" and no subsequent MFA step completed. These are password-only authentications, accounts where MFA was not enforced despite being available. This is a conditional access coverage gap query.

NE environmental considerations

NE's detection environment includes specific factors that influence this rule's operation:

Anti-Pattern

Using mv-expand and mv-apply without understanding the output

The query runs. The results look reasonable. The analyst trusts the output without verifying it against the raw data. Every KQL operator transforms data, and every transformation can mask, distort, or omit information if the operator is misused. Validate query results against known-good data before building detection rules or investigation conclusions on them.

Device diversity: 768 P2 corporate workstations with full Defender for Endpoint telemetry, 58 P1 manufacturing workstations with basic cloud-delivered protection, and 3 RHEL rendering servers with Syslog-only coverage. Rules targeting DeviceProcessEvents operate with full fidelity on P2 devices but may have reduced visibility on P1 devices. Manufacturing workstations in Sheffield and Sunderland represent a detection gap for endpoint-level detections.

Network topology: 11 offices connected via Palo Alto SD-WAN with full-mesh connectivity. The SD-WAN firewall logs feed CommonSecurityLog in Sentinel. Cross-site lateral movement generates firewall allow events that correlate with DeviceLogonEvents, enabling multi-source detection that single-table rules cannot achieve.

User population: 810 users with distinct behavioral profiles, office workers (predictable hours, consistent applications), field engineers (variable hours, travel patterns), IT administrators (elevated privilege, broad access patterns), and manufacturing operators (fixed shifts, limited application access). Each user population has different detection baselines.

Troubleshooting

"The query returns an error I do not understand." KQL error messages reference the specific line and operator that failed. Read the error message from left to right: it names the operator, the expected input type, and the actual input type. Most errors are type mismatches (passing a string where a datetime is expected) or field name typos. The getschema operator shows every field name and type for any table: TableName | getschema.

"The query runs but returns unexpected results." Add | take 10 after each operator in the pipeline and examine the intermediate output. This reveals WHERE the data transforms in a way you did not expect. Debug the pipeline stage by stage, not the entire query at once.

Section Reference

Operators covered in this subsection: Review the KQL examples above and add the patterns to your personal query library (K13). Each pattern is reusable across any Sentinel table for security investigation.