In this section

The Query Pipeline — How Operators Chain Left to Right

3-4 hours · Module 1 · Free

What you already know

You know that every KQL operation takes a table in and produces a table out. You have used the pipe character to chain operators and seen how where, project, and sort by transform a result set. The next step is the pipeline model in depth: how operators chain, why their order changes both results and performance, and how to trace and debug a multi-operator query step by step.

Scenario

Priya writes a query to find users with high failure rates: SigninLogs | where TimeGenerated > ago(24h) | summarize TotalEvents = count(), FailCount = countif(ResultType != "0") by UserPrincipalName | extend FailRate = round(100.0 * FailCount / TotalEvents, 1) | where FailRate > 50. It works. She then tries to simplify by computing the fail rate earlier: SigninLogs | where TimeGenerated > ago(24h) | extend FailRate = round(100.0 * countif(ResultType != "0") / count(), 1) | where FailRate > 50. The query fails with an error. Priya knows the operators. What she does not understand is which operators can appear at which stage of the pipeline, and why the order changes what is possible.

The anatomy of a pipeline

A KQL query is a sequence of operators separated by pipes. The first element is a data source (usually a table name). Each subsequent operator receives the tabular output of the previous one, transforms it, and passes a new table forward. The final operator's output is the query result.

KQL

// Step 1: start with the table (all rows, all columns)
SigninLogs
// Step 2: filter rows by time (reduces rows, columns unchanged)
| where TimeGenerated > ago(24h)
// Step 3: filter to failures only (reduces rows further)
| where ResultType != "0"
// Step 4: extract country from dynamic column (adds one column)
| extend Country = tostring(LocationDetails.countryOrRegion)
// Step 5: aggregate by IP and country (rows collapse to groups)
| summarize FailCount = count(), UniqueUsers = dcount(UserPrincipalName)
    by IPAddress, Country
// Step 6: filter the aggregated results
| where FailCount > 20
// Step 7: order by highest failure count
| sort by FailCount desc

Trace the table shape through each step. Step 1 produces all rows from SigninLogs (perhaps 1.2 million in 24 hours for a large tenant). Step 2 reduces to events in the last 24 hours. Step 3 keeps only failed sign-ins, dropping the row count from perhaps 1.2 million to 45,000. Step 4 adds one new column (Country) to each surviving row, but does not change the row count. Step 5 collapses 45,000 rows into distinct IP+Country groups, perhaps 3,200 rows. Step 6 filters to groups with more than 20 failures. Step 7 sorts the survivors.

At every stage the data is a table. The row count changes. The column set changes. The tabular structure does not.

Figure 1.2: Row count through a seven-operator pipeline. The biggest reduction happens at where ResultType != "0" (1.2M to 45K) and summarize (45K to 3.2K). The filter and aggregation stages are where pipeline design decisions have the most impact.

Why operator order changes results

The pipeline executes top to bottom, left to right. Unlike SQL, where the engine reorders clauses during execution planning, KQL runs operators in the order you wrote them. This means the order you choose determines the result.

Consider the distinction between extend and summarize. The extend operator adds a computed column to each individual row. It cannot perform aggregation because it operates one row at a time. The summarize operator collapses multiple rows into groups and computes aggregate values across those groups. After a summarize, the only columns that survive are the by columns and the aggregation results. Every other column from the original table is gone.

This is why Priya's query in the scenario failed. She tried to use countif() inside extend. But countif() is an aggregation function that needs to operate across groups of rows. Inside extend, there is no group. There is only the single current row. Aggregation functions only work inside summarize or make-series, where the engine knows which rows to aggregate. The fix is to summarize first (which produces the aggregated columns), then extend on the summarized output to compute derived metrics like the failure rate.

This constraint applies to all aggregation functions: count(), dcount(), countif(), sum(), avg(), min(), max(), percentile(), make_set(), make_list(). None of these can appear inside extend. All of them require summarize.

Filter early, aggregate late

The position of where clauses in the pipeline has a direct impact on query performance. Two queries can produce identical results but execute at dramatically different speeds depending on where the filters appear.

KQL

// SLOW: summarize all users, then filter to one user
SigninLogs
| where TimeGenerated > ago(30d)
| summarize count() by UserPrincipalName, IPAddress
| where UserPrincipalName == "j.morrison@northgateeng.com"

This query forces the engine to aggregate every user across 30 days before discarding all of them except Morrison. On a 5,000-person tenant, that means summarizing millions of rows into tens of thousands of groups, then keeping a handful.

KQL

// FAST: filter to one user first, then summarize only their data
SigninLogs
| where TimeGenerated > ago(30d)
| where UserPrincipalName == "j.morrison@northgateeng.com"
| summarize count() by IPAddress

Same result. The second query filters first, reducing millions of rows to a few hundred before the summarize touches any data. The performance difference is seconds versus minutes on a 30-day window.

The rule: place where clauses as early in the pipeline as possible. The earlier you reduce the row count, the less work every subsequent operator performs. Time filters (where TimeGenerated > ago()) should almost always be the first operator after the table name, because the Kusto engine uses them to prune data shards at the storage level, skipping entire partitions of data that fall outside the time window.

Debugging a pipeline step by step

When a query produces unexpected results, the debugging technique is consistent: remove operators from the bottom one at a time until the output makes sense, then re-add them to find the operator that introduces the problem.

A faster technique for large queries: add | count after each operator to see the row count at each pipeline stage.

CLI Output

SigninLogs | where TimeGenerated > ago(24h) | count
→ 1,200,000
SigninLogs | where TimeGenerated > ago(24h) | where ResultType != "0" | count
→ 45,000
SigninLogs | ... | extend Country = tostring(LocationDetails.countryOrRegion) | count
→ 45,000  (extend never changes row count)
SigninLogs | ... | summarize count() by IPAddress, Country | count
→ 3,200   (summarize collapses to unique groups)

The row count at each stage tells you exactly where the data changes. If a where drops more rows than expected, that filter is too restrictive or the data does not contain what you assumed. If summarize produces more groups than expected, the by columns have more distinct value combinations than you anticipated. Each unexpected number is a diagnostic signal pointing to the specific operator that needs adjustment.

The incremental build approach applies the same principle during initial query development. Start with the table and a time filter. Run it. Add the entity filter. Run it. Add the projection or aggregation. Run it. Verify at each stage before adding the next operator. Five separate executions taking 10 seconds each is faster than one complex query that runs for two minutes and produces wrong results you then have to debug from scratch.

Columns disappear after summarize

The most common pipeline mistake for new KQL users is attempting to reference original table columns after a summarize. After summarize, the only columns that exist are the by columns and the aggregation results. Everything else is gone. If you need the Location column in your output but did not include it in the by clause or aggregate it with make_set(Location), it does not exist in the post-summarize table. Adding it with extend after the summarize fails because LocationDetails no longer exists in the table flowing through the pipe.

The solution is to compute or extract needed values before the summarize (using extend), then include them in the by clause or aggregate them. Planning which columns you need in the final output before you write the summarize avoids this class of error entirely.

let statements: naming intermediate tables

As queries grow more complex, the pipeline can become difficult to read. The let statement addresses this by assigning a name to an intermediate table, which you can reference later in the query.

KQL

let failedSignIns = SigninLogs
    | where TimeGenerated > ago(24h)
    | where ResultType != "0";
let recentActivity = OfficeActivity
    | where TimeGenerated > ago(24h);
failedSignIns
| summarize FailCount = count() by UserPrincipalName
| join kind=leftouter (
    recentActivity
    | summarize ActionCount = count() by UserId
) on $left.UserPrincipalName == $right.UserId

Each let statement defines a named tabular result. The names failedSignIns and recentActivity are not variables that store data in memory. They are query fragments that the engine evaluates when referenced. The let statements must end with semicolons. The final tabular expression (the one without a let prefix) is the query that produces the output.

This pattern is essential for joins because you often need to prepare both sides of the join independently before combining them. Instead of nesting one pipeline inside another (which KQL does not support in the way SQL supports subqueries), you name each pipeline with let and reference them in the join. Module 4 uses let extensively.

let also enables reusable scalar values. let lookback = 24h; defines a timespan you can reference multiple times in the query: where TimeGenerated > ago(lookback). If you need to change the lookback from 24 hours to 7 days, you change it in one place instead of finding every ago() call in the query.

Building a query incrementally during investigation

Experienced analysts do not write complete queries from scratch. They build incrementally, adding one or two operators at a time and verifying the output at each stage.

Start with the table and a time filter. Run it, check the row count. Add the entity filter (the user, IP, or device you are investigating). Run it, confirm data exists. Add a project to see the relevant columns. Scan the values. Based on what you see, add the appropriate aggregation or sort. Each step validates the previous step's output before adding complexity.

A practical note on the entity filter step: use =~ for case-insensitive string comparison when filtering by UserPrincipalName. The value might be stored as j.morrison@northgateeng.com or J.Morrison@NorthgateEng.com depending on how the identity provider normalized it. where UserPrincipalName =~ "j.morrison@northgateeng.com" matches both. The strict == operator is case-sensitive and will miss the match if the casing differs. For investigation queries where you need to find all events for a user regardless of case normalization, =~ is the safer default.

Five separate executions taking 10 seconds each is faster than one 12-line query that runs for two minutes and produces wrong results. The incremental approach also prevents a common time sink: building a complex query against a user or entity that has zero events in the table. If step 2 (the entity filter) returns zero rows, you know immediately. Without the incremental approach, you debug for 20 minutes before discovering the table does not contain data for that entity.

Pipeline optimization beyond filter placement

Two additional principles complement the "filter early" rule.

Reduce column width early. Wide rows with many columns consume more memory at every pipeline stage. If you need five columns from a 50-column table, add a project after the first where to narrow the data carried through the pipeline. This is most impactful for tables like DeviceProcessEvents that have 30+ columns including large string fields like ProcessCommandLine. Projecting to the columns you need before a summarize reduces the memory footprint the aggregation operates on.

Aggregate before joining. If you need to join two large tables, aggregate each table first to reduce row counts, then join the smaller aggregated tables. Joining a million-row table to another million-row table is expensive. Joining a 500-row aggregated summary to another 500-row summary is instant. The let statement pattern above demonstrates this approach: each side is filtered and summarized independently before the join combines them.

Analyst Decision

Before writing a summarize: List the columns you need in the final output. For each one, decide: is it a grouping key (goes in the by clause), an aggregation target (goes inside a function like count() or make_set()), or a derived value (compute with extend before the summarize)?

After the summarize runs: Only the by columns and aggregation columns exist. New computed columns (like failure rates or time deltas) are added with extend on the summarized output. This is the two-step pattern: summarize to aggregate, then extend to compute derived metrics.

The everything-at-once query

The analyst writes a 12-line query from start to finish, runs it, gets wrong results, and stares at the output trying to figure out which operator misbehaved. The pipeline has 12 transformation steps, any of which could be the source of the error. The fix is incremental construction: build two operators, verify the output, add two more, verify again. Each verification takes seconds. Debugging a monolithic query takes minutes or hours, because you cannot see the intermediate states without decomposing it anyway.

KQL Principle

The pipeline reads and executes top to bottom. Each operator sees only the table produced by the operator above it. If you can predict the table shape (row count, column set) at each stage, you can predict the final result. Build queries incrementally: two operators, verify, two more, verify. The time invested in verification is always less than the time spent debugging a monolithic query that produces unexpected results.

Section 1.3 introduces the type system that governs every column in every table. You will learn the seven data types that appear in security log tables, why some columns look like one type but are stored as another, and how type mismatches cause the silent failures that produce wrong investigation results.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →