In this section

The Tabular Data Model — Everything Is a Table

3-4 hours · Module 1 · Free

What you already know

You wrote your first KQL query in Module 0, using four operators to investigate a credential spray. You know the query starts with a table name and chains operators with pipes. The structural principle behind that pattern is KQL's tabular data model, where every input and every output is a table with rows and columns. Understanding this model is what separates reading queries from reasoning about them.

Scenario

Tom Ashworth runs a query he found on a security blog: SigninLogs | summarize count() by IPAddress | where count_ > 100 | sort by count_ desc. The blog said it finds high-volume IP addresses. Tom gets results, but he cannot explain why swapping the where and summarize lines would produce a completely different output. He cannot predict what happens if he adds a project after the sort. He copies queries that work but cannot modify them, debug them, or build new ones from scratch. The gap is not syntax knowledge. It is understanding what each operator receives and what it produces.

What a table is in KQL

Every data structure in KQL is a table. A table has columns, each with a name and a fixed data type, and rows, each containing one value per column. There is no other data shape. A query that counts sign-in failures returns a table with one column (Count) and one row containing the number. A query that matches zero events returns a table with column headers and zero rows. A summarize that groups sign-ins by IP address returns a table where each row is one distinct IP address and the columns are the grouping key plus the aggregation results.

This is absolute. There is no scalar return type, no single-value mode, no mixed output. The count operator does not return a number. It returns a one-row, one-column table whose single cell contains the number. This means you can pipe the output of count into another operator. SigninLogs | count | where Count > 1000 is valid KQL because the output of count is a table, and where consumes a table.

Security log tables in Sentinel follow this same structure. SigninLogs is a table where each row represents one interactive authentication attempt. The columns are the attributes of that event: TimeGenerated (when the sign-in occurred), UserPrincipalName (who authenticated), IPAddress (from where), ResultType (what happened), Location (a nested object containing country and city derived from geo-IP lookup), and approximately 40 more. DeviceProcessEvents is a table where each row represents one process creation on an MDE-enrolled endpoint: FileName, ProcessCommandLine, AccountName, InitiatingProcessFileName, SHA256.

Every row is one discrete event. Every column is one attribute of that event. This mapping is consistent across every table in the workspace.

Figure 1.1: The tabular invariant. The source table has 45,000 rows. After where, 800 rows remain but the structure is unchanged. After summarize, the row count drops to 12 and the columns change to the grouping key plus aggregation results. At every stage, the data is a table.

Exploring table structure before you query

Before writing a query against an unfamiliar table, you need to know its shape: which columns exist, what types they are, and what the data looks like. Two operators handle this.

getschema returns the column inventory for any table. The output is itself a table (because everything is a table) with columns for ColumnName, ColumnOrdinal, DataType, and ColumnType:

KQL

SigninLogs
| getschema

The first rows of the result show the column inventory:

CLI Output

ColumnName                       DataType      ColumnType
TimeGenerated                    System.DateTime   datetime
UserPrincipalName                System.String     string
IPAddress                        System.String     string
Location                         System.Object     dynamic
ResultType                       System.String     string
ConditionalAccessStatus          System.String     string
DeviceDetail                     System.Object     dynamic
RiskLevelDuringSignIn            System.String     string
AuthenticationRequirement        System.String     string
AppDisplayName                   System.String     string

Two things to notice immediately. ResultType is string, not int. The value stored in the column is "0" or "50126", not the number 0 or 50126. Any query that compares ResultType to an unquoted integer (where ResultType != 0) will silently match zero rows. Second, Location and DeviceDetail are dynamic, meaning they contain nested JSON objects. Accessing the country from Location requires dot notation (Location.countryOrRegion) or extraction functions, not simple string comparison.

take returns a sample of actual rows from the table. Where getschema tells you the column names and types, take shows you what real values look like:

KQL

SigninLogs
| take 5

take 5 returns five arbitrary rows (not the most recent five, not a random sample, just five rows from wherever the engine finds them first). Expand each row in the results pane by clicking the arrow, and you see every column with its actual value. This is how you discover what Location actually contains ({"countryOrRegion":"GB","state":"England","city":"London"}), what ResultType values look like in practice ("0", "50126", "500121"), and which columns are populated versus empty for your specific workspace.

The combination of getschema (structure) and take (content) is the first step before writing any query against an unfamiliar table. Run both before you write a where clause.

Why the output is always a table

The tabular invariant has practical consequences that affect every query you write.

A count is a table. SigninLogs | count returns a table with one column named Count and one row containing the number. Because the output is a table, you can chain further operators: SigninLogs | count | where Count > 1000 is valid. This returns either the one-row table (if the count exceeds 1000) or an empty table (if it does not). Detection rules use this pattern: a rule that returns zero rows means no alert. A rule that returns one or more rows triggers an alert.

A summarize is a table. SigninLogs | summarize count() by IPAddress returns a table where each row is one distinct IP address and the columns are IPAddress (the grouping key) and count_ (the aggregation result, automatically named with a trailing underscore). This output table can be joined, filtered, extended, summarized again, or rendered as a chart. The aggregation did not produce a "report" or a "summary." It produced another table that flows into the next operator.

This composability is powerful. You can summarize the result of a summarize. Consider a query that first counts sign-in failures per user, then counts how many users had more than 50 failures. The first summarize groups by user and produces a table of failure counts per user. The second summarize counts rows in that intermediate table where the count exceeds the threshold. Two aggregation steps, each operating on the tabular output of the previous one. This pattern appears in Module 3 when you build statistical baselines.

A join produces a table. When you join SigninLogs to OfficeActivity on UserPrincipalName, the output is a single table with columns from both inputs. If both tables have a column named TimeGenerated, the output renames the right-side column to TimeGenerated1 to avoid ambiguity. The join output is a regular table that you filter, project, and sort like any other. Module 4 teaches every join type in detail.

An empty result is a table. A query that matches zero rows returns a table with column headers and zero data rows. This is not an error. It is a valid, meaningful output: "no events matched your criteria." Scheduled analytics rules depend on this distinction. Zero rows means the detection did not fire. One or more rows means it did. The tabular model makes this binary evaluation automatic.

Row order is not guaranteed

Tables in KQL have no inherent row sequence. When you run SigninLogs | take 10, the ten rows you receive are not necessarily the most recent, the oldest, or randomly sampled. They are whatever rows the engine retrieves first from its storage shards. Run the same query twice and you might get different rows.

If you need a specific order, you must request it explicitly with sort by. SigninLogs | where TimeGenerated > ago(1h) | sort by TimeGenerated desc returns rows ordered from newest to oldest. Without the sort, the order is undefined. This is different from SQL, where SELECT TOP 10 without an ORDER BY is implementation-defined but often returns results in primary key order. KQL makes no such implicit guarantee. If order matters for your output, specify it.

The SQL mental model and where it breaks

If you have written SQL, the tabular model will feel partially familiar and partially disorienting. SQL also operates on tables. The difference is in how queries compose.

In SQL, a SELECT statement can return a scalar (a single value), a row, or a result set. A subquery in a WHERE clause might return a single value used in a comparison. KQL has no equivalent. Every expression that produces data produces a table. Where SQL uses subqueries nested inside WHERE or HAVING clauses, KQL uses let statements to name intermediate tables and join to combine them. The nesting is flattened into a sequence of named steps.

SQL execution order is also different from its visual order. A SQL SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY statement reads top to bottom but executes FROM first, then WHERE, then GROUP BY, then HAVING, then SELECT, then ORDER BY. KQL reads and executes in the same order: top to bottom, left to right. The first line runs first. The pipe passes the result to the second line. Each line transforms the output of the previous one. What you read is what executes.

This makes KQL queries easier to debug. If a query produces unexpected results, you can remove lines from the bottom one at a time and observe how the output changes at each stage. Each line is a visible transformation step. There are no hidden execution reorderings to account for.

Analyst Decision

Before writing any query against an unfamiliar table:

Step 1: Run TableName | getschema to see column names and types. Identify string columns that look numeric (ResultType), dynamic columns that contain nested JSON (Location, DeviceDetail), and datetime columns that enable time filtering (TimeGenerated).

Step 2: Run TableName | take 5 to see real values. Expand each row. Note which columns are populated versus empty in your specific workspace. Identify the entity columns (user, device, IP) that you will filter and group by.

Step 3: Write the query. The schema tells you which columns exist and what types to use in comparisons. The sample rows tell you what values to expect in filters. Both steps take under 30 seconds and prevent the most common class of query errors.

Retention and the invisible boundary

Every table in a Sentinel workspace has a retention period that determines how far back your queries can reach. SigninLogs defaults to 30 days in the analytics tier. SecurityEvent defaults to 90 days. DeviceProcessEvents holds 30 days in Advanced Hunting. These defaults are configurable up to 730 days for the analytics tier, and data can be archived to the data lake tier for up to 12 years.

The critical behavior: KQL does not warn you when your time filter exceeds retention. If you query SigninLogs | where TimeGenerated > ago(90d) on a workspace with 30-day retention, you get 30 days of data without any error or warning. The query runs, returns results, and the analyst assumes they searched 90 days. The first 60 days simply do not exist in the table. This creates a dangerous false negative in investigations. The analyst concludes "this IP address has never appeared in our sign-in history" when the actual conclusion should be "this IP address has not appeared in the last 30 days of retained data."

During active incident response, this is why responders immediately extend retention or export relevant tables to long-term storage. Evidence that falls outside the retention window is gone permanently. The Kusto engine will never tell you it is missing.

The assumed-complete search

The analyst queries SigninLogs | where TimeGenerated > ago(180d) | where IPAddress == "185.220.101.42" and gets zero results. The analyst documents: "No prior authentication from this IP in the last 6 months." The workspace retains SigninLogs for 30 days. The analyst actually searched 30 days, not 180. The investigation conclusion is based on incomplete data, and the analyst does not know it. Check retention before trusting a negative result over a long time window.

KQL Principle

Every operator takes a table in and produces a table out. This is the invariant that makes KQL predictable. If you can describe the table shape (row count, column set) at each stage of the pipeline, you can predict the final result. If you cannot, the query is a black box. Before writing complex queries, practice tracing the table shape through simple ones: how many rows survive each where, how many rows does summarize produce, which columns does project keep.

Section 1.2 takes the tabular model and shows how operators chain together through the pipeline. You will trace a multi-operator query step by step, see how execution order changes results, and learn why the position of each operator in the pipeline determines what it can access.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →