In this section

The Microsoft Security Data Model

2-3 hours · Module 0 · Free

What you already know

You know that Sentinel and Defender XDR both have query interfaces. You may have used the Sentinel Logs blade or Advanced Hunting to run a query someone else wrote. This section maps the complete data model, which products generate which tables, where the schemas overlap and diverge, and how events travel from generation to queryable row. Understanding this pipeline is what separates an analyst who writes targeted queries from one who guesses at table names.

Scenario

An analyst at Northgate Engineering writes a query against EmailEvents in the Sentinel Logs blade and gets zero results. The analyst knows emails were delivered — the user received a phishing message two hours ago. The problem is not the query logic. The problem is that EmailEvents with full metadata lives in Defender XDR Advanced Hunting, not in the Sentinel Log Analytics workspace. The Sentinel connector delivers email data to OfficeActivity instead, with a different schema and different field names. The analyst wasted 20 minutes because they queried the right question against the wrong table in the wrong query surface.

Two query surfaces, one language

Microsoft provides two places to write KQL for security work. Both use identical syntax. The difference is which tables are available and where the data physically resides.

Microsoft Sentinel runs KQL against a Log Analytics workspace. The workspace aggregates data from every connected source: Entra ID sign-in logs, Defender for Endpoint telemetry, Office 365 activity, third-party firewalls, custom application logs, syslog from Linux hosts. Sentinel is the superset. It can contain everything. Retention ranges from 30 days to 12 years depending on the table tier (analytics tier for hot data, data lake tier for long-term archival). Analytics rules, workbooks, hunting queries, and automated playbooks all run against this workspace.

Defender XDR Advanced Hunting runs KQL against the Defender data store. This contains telemetry from Microsoft's own security products: Defender for Endpoint, Defender for Office 365, Defender for Identity, Defender for Cloud Apps, and Entra ID Protection. The data store holds 30 days of hot data by default. When Sentinel is onboarded to the unified Defender portal, analysts can query both Sentinel and Defender tables from a single Advanced Hunting interface. The two surfaces are converging.

The overlap between the two surfaces is substantial but not complete. Many Defender for Endpoint tables exist in both environments with identical schemas: DeviceProcessEvents, DeviceFileEvents, DeviceNetworkEvents, DeviceLogonEvents. You write the same query in either surface and get the same results. Some tables exist only in Sentinel: CommonSecurityLog for CEF-format firewall data, Syslog for Linux hosts, any custom log table your organization ingests. Some tables exist only in Defender XDR with full fidelity: EmailEvents with complete email metadata, CloudAppEvents with detailed SaaS application telemetry, UrlClickEvents for Safe Links tracking.

For this course, we teach KQL against Sentinel tables unless noted otherwise. Sentinel is the superset. It contains Defender data plus everything else. Every query in this course runs in the Sentinel Logs blade. Most also run in Defender Advanced Hunting with minor table or field name adjustments. When the differences matter, the course notes them explicitly.

Six product families, six data domains

Each Microsoft security product generates telemetry in a distinct domain. Knowing which product covers which domain tells you which table to query before you open the editor.

Figure 0.2 — The Microsoft security data model. Six product families generate telemetry in distinct domains (Defender for Cloud Apps not shown — see text below). All data flows into one Sentinel workspace, queryable via KQL. Cross-product investigation is a join between tables from different products.

Entra ID generates identity telemetry. Every interactive sign-in, every token refresh, every non-interactive application authentication, every directory change (all recorded. The primary tables are SigninLogs (interactive sign-ins with full Conditional Access evaluation, risk scoring, device detail, and MFA method), AADNonInteractiveUserSignInLogs (token refreshes and application-initiated authentications), and AuditLogs (directory configuration changes) role assignments, application registrations, CA policy modifications, group membership changes). The investigation question these tables answer: "Who authenticated, from where, with what method, and what did the access evaluation decide?"

Defender for Endpoint (MDE) generates device telemetry. Every process creation, file operation, registry modification, and network connection on managed endpoints. The primary tables are DeviceProcessEvents (the most queried, covering every process that starts, including the full command line, the parent process, the user context, and the file hash), DeviceFileEvents (file creation, modification, deletion, rename), DeviceNetworkEvents (outbound connections with remote IP, port, and URL), and DeviceRegistryEvents (registry key changes). The investigation question: "What executed on the device, what did it touch, and what did it connect to?"

Defender for Office 365 (MDO) generates email and collaboration telemetry. EmailEvents captures the delivery pipeline (sender, recipient, subject, delivery action, filtering verdict, authentication results. OfficeActivity captures what users do inside Exchange and SharePoint) inbox rule creation, email forwarding, file downloads, sharing permission changes. The distinction matters: EmailEvents tells you what the mail system did with a message. OfficeActivity tells you what the user did inside their mailbox or library. Phishing investigations typically start with EmailEvents (was the message delivered?) and pivot to OfficeActivity (did the user interact with it?).

Defender for Identity (MDI) generates on-premises Active Directory telemetry. IdentityLogonEvents records Kerberos and NTLM authentication on domain controllers. The authentication events that Entra ID does not see because they occur below the cloud identity layer. IdentityDirectoryEvents records AD object changes: group membership modifications, password resets, attribute updates. Hybrid organizations use these tables alongside Entra ID tables to build a complete authentication picture across cloud and on-premises. The investigation question: "What happened on the domain controller that the cloud identity layer cannot see?"

Defender for Cloud Apps (MDA) generates SaaS application telemetry. CloudAppEvents captures user activity across connected cloud applications: file downloads, sharing changes, admin actions, login events from third-party SaaS platforms that Microsoft monitors through API connectors or proxy-based session control. This is the table that covers shadow IT discovery (which unsanctioned applications are employees using?) and OAuth application governance (which applications have been granted consent to access organizational data?). The investigation question: "What are users doing inside cloud applications beyond Microsoft 365?"

Sentinel generates its own operational and aggregation tables. SecurityAlert aggregates alerts from every connected product into a single normalized schema (a single query against SecurityAlert returns alerts from Defender for Endpoint, Defender for Office 365, Defender for Identity, and any third-party product with an alert connector. SecurityIncident groups related alerts into incidents with severity, status, and assignment fields. CommonSecurityLog ingests CEF-format data from firewalls, proxies, and network appliances) Palo Alto, Fortinet, Check Point, and dozens of others use this format. Syslog ingests Linux host logs. Custom log tables hold anything an organization chooses to ingest: application logs, cloud provider audit trails, HR system events.

The ingestion pipeline — from event to queryable row

Understanding when data becomes queryable is critical for both investigations and detection rules. A sign-in event does not appear in SigninLogs the instant it occurs. It travels through a pipeline.

Step 1: Event generation. j.morrison opens Outlook and authenticates to Exchange Online. Entra ID generates a sign-in event with the full authentication record: UserPrincipalName, IPAddress, DeviceDetail (browser, OS, compliance status), ResultType (success or failure code), ConditionalAccessStatus, RiskLevelDuringSignIn, and 30 additional fields.

Step 2: Connector transport. The Entra ID diagnostic settings forward the event to the Log Analytics workspace. Transport latency is typically 2 to 5 minutes. During this window, the event exists in Entra ID but is not yet queryable in Sentinel.

Step 3: Ingestion and indexing. The event lands in the SigninLogs table as a new row. Every field becomes a column. The TimeGenerated column records when the event was ingested, not when it occurred. The actual sign-in time is in a separate property. This distinction matters for time-based queries: where TimeGenerated > ago(5m) may miss events that occurred 4 minutes ago but took 3 minutes to ingest.

Step 4 (Query availability. The row is queryable. Any analyst with Log Analytics Reader permissions can find it. The row persists for the table's configured retention period) 90 days default for analytics-tier tables, configurable up to 2 years. Data lake tier tables can retain data for up to 12 years.

At Northgate Engineering, this pipeline processes approximately 45,000 SigninLogs rows per day, 32,000 DeviceProcessEvents rows per day, and 15,000 OfficeActivity rows per day. Every one of these rows is queryable within minutes of the event occurring. The total workspace ingestion across all tables is roughly 18 GB per day.

Ingestion latency and its operational consequences

Latency varies by product. SigninLogs typically arrives within 2 to 5 minutes. DeviceProcessEvents takes 3 to 8 minutes. OfficeActivity can take 5 to 15 minutes. These numbers are not guarantees. Spikes in volume, connector issues, and service-side delays can extend them.

For investigation queries, the practical consequence is straightforward: during an active incident, set the time window wider than you think you need. where TimeGenerated > ago(15m) catches events that occurred 10 minutes ago but took 5 minutes to arrive. where TimeGenerated > ago(5m) misses them.

For detection rules, the consequence is architectural. A scheduled analytics rule that runs every 5 minutes with a 5-minute lookback window has a blind spot. If an event takes 6 minutes to ingest, the rule's first execution misses it. The second execution's window starts at the 5-minute mark and also misses it. The event falls through the gap. Production detection rules should use a lookback window that exceeds the schedule interval by at least the expected ingestion latency. A 5-minute schedule with a 15-minute lookback window closes this gap with overlap. Some events are evaluated twice, but none are missed. K10 covers detection rule scheduling in detail.

The missing-event investigation

The analyst knows an event occurred. A user reported a phishing email, a colleague witnessed a file download, but the query returns zero results. The analyst concludes the data is not being collected. In most cases, the data exists but has not yet been ingested. Extending the time window by 15 to 30 minutes often surfaces the event. The second common cause: querying the wrong table. Email delivery events live in EmailEvents (Defender XDR) or OfficeActivity (Sentinel), not in both, and the field names differ between them.

Data tiers and retention architecture

Not all data in a Sentinel workspace has the same query characteristics. Microsoft's data tier architecture separates data into two main storage classes.

Analytics tier is the hot storage. Data here is fully queryable with no performance penalty, supports analytics rules and alerts, and powers workbooks and dashboards. Standard retention is 90 days, configurable up to 2 years. This is where active investigation and detection happens.

Data lake tier (powered by Fabric) is the cold storage. Data here is queryable via KQL but with higher latency and some operational restrictions: analytics rules cannot run against data lake tables directly, and some functions have limitations. Retention extends up to 12 years. As of early 2026, Defender for Endpoint and Defender for Office 365 Advanced Hunting tables can be ingested directly into the data lake tier at lower cost, bypassing the analytics tier entirely for data you need to retain but do not need for real-time detection.

The practical impact for this course: everything you learn in Phases 1 through 3 targets analytics-tier data. The KQL syntax is identical for both tiers. The difference is operational, which queries you can run as scheduled rules versus which you run as ad-hoc investigations. K9 (performance optimization) covers the query planning implications of tier-aware data architecture.

Discovering what your workspace actually contains

Theory tells you which tables should exist. Your workspace tells you which tables actually have data. Every Sentinel deployment is different, which connectors are enabled, which products are licensed, how long data has been flowing. The first thing an analyst should do in a new workspace is run a discovery query to see what is actually there.

The Usage table records ingestion volume for every table in the workspace. It tells you which tables received data, how much, and how recently. This query returns the active tables in your workspace ranked by the volume of data they ingested over the past seven days:

KQL — Workspace Table Discovery

// Which tables have data and how much are they ingesting?
Usage
| where TimeGenerated > ago(7d)
| summarize TotalGB = round(sum(Quantity) / 1024, 2),
            LastRecord = max(TimeGenerated)
    by DataType
| sort by TotalGB desc

In Northgate Engineering's workspace, this query returns output like the following:

CLI Output — Workspace Discovery Results

DataType                    TotalGB    LastRecord
DeviceProcessEvents         4.82       2026-05-21T14:32:18Z
SigninLogs                  2.41       2026-05-21T14:31:55Z
DeviceNetworkEvents         1.93       2026-05-21T14:32:01Z
CommonSecurityLog           1.67       2026-05-21T14:30:44Z
OfficeActivity              1.12       2026-05-21T14:29:38Z
DeviceFileEvents            0.94       2026-05-21T14:32:12Z
AADNonInteractiveUserSig..  0.87       2026-05-21T14:31:49Z
SecurityAlert               0.23       2026-05-21T14:28:11Z
AuditLogs                   0.19       2026-05-21T14:31:22Z
DeviceLogonEvents           0.16       2026-05-21T14:31:58Z
Syslog                      0.11       2026-05-21T14:27:33Z
SecurityIncident            0.02       2026-05-21T13:45:02Z

This output tells you three things immediately. First, which data sources are active (if SigninLogs is missing from the results, the Entra ID connector is not configured, and every identity query in this course will return empty. Second, the relative volume) DeviceProcessEvents at nearly 5 GB in seven days means endpoint telemetry dominates this workspace, which is typical for organizations with Defender for Endpoint deployed to all managed devices. Third, the recency. A LastRecord timestamp more than an hour old for a table that should be continuously streaming suggests an ingestion issue worth investigating before you spend time writing queries against stale data.

Run this query in your own workspace before starting K1. The output is your personal map of available data. If a table referenced in this course does not appear in your results, the corresponding connector is either not enabled or not licensed. The course notes which tables are required for each module, so you can plan which queries you can run immediately and which require additional connector configuration.

KQL Principle

Before writing any query, answer two questions: which product generated the data, and which table in which query surface holds it. The data model is the prerequisite for every query. An analyst who knows the operators but queries the wrong table gets zero results and wastes investigation time. An analyst who knows the data model asks the right table first.

Section 0.3 profiles the core tables you will query throughout this course. The columns that matter, the investigation questions each table answers, and how to explore your own workspace to discover what data is available. The data model tells you where to look. The table profiles tell you what to look for.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →