In this section

TH0.10 M365 Data Sources for Hunting

3-4 hours · Module 0 · Free

What you already know

Section 0.9 dismantled the myths that block hunting programs. The organizational barriers are addressed. But operational hunting requires more than a green light from leadership — it requires data. Every hunt query targets a specific table in your SIEM or Advanced Hunting console. If the table is not ingested, the hunt is blind. If the table is ingested but missing key entities, the hunt produces false negatives. This section maps the M365 telemetry landscape: what each table records, what it misses, and which hunt campaigns depend on which data sources. This is the reference guide you'll return to before every campaign.

Scenario

Priya Sharma prepares for her first hunt campaign — testing a hypothesis about AiTM token replay in Northgate Engineering's M365 tenant. She opens the Advanced Hunting console and writes a query against SigninLogs. Results come back clean. She reports a negative finding. Tom Ashworth asks a simple question: "Did you check AADNonInteractiveUserSignInLogs? Token replay happens through non-interactive refresh, not interactive sign-in." Priya did not know the table existed. Her "negative finding" was actually a blind spot — she looked in the wrong table. The hunt was not wrong. The data source selection was wrong. That distinction matters.

Three clusters, three attack planes

The M365 telemetry landscape organizes into three functional clusters. Each cluster covers a distinct attack plane, and each has characteristic blind spots that create hunting gaps if not understood before querying.

Identity and authentication covers the sign-in and directory change surface. Four tables form this cluster.

SigninLogs records interactive user sign-ins — browser-based authentication to Entra ID with user principal name, IP address, location (country and city), device details (OS and browser), conditional access evaluation results, risk level, MFA requirement and method, authentication protocol, and application accessed. When you need to answer "where are users signing in from?" or "which sign-ins bypassed MFA?" or "which conditional access policies applied?" this is the starting table. Hunt campaigns TH4 (identity compromise) and TH10 (lateral movement) query it heavily.

AADNonInteractiveUserSignInLogs records token refreshes and application-based sign-ins. This is where AiTM token replay appears — the attacker's stolen refresh token generates new access tokens from the attacker's IP without any interactive sign-in event. When you need to answer "are refresh tokens being used from IPs that differ from the user's interactive sign-in location?" this is the only table with the answer. What it misses: the initial interactive authentication (that is in SigninLogs) and the actual data access performed with the token (that is in CloudAppEvents or application-specific audit logs).

AADServicePrincipalSignInLogs records application-credential authentication — service principals authenticating with client secrets or certificates rather than on behalf of a user. When you need to answer "which service principals are authenticating, from which IPs, and has the authentication pattern changed?" this is the table. Compromised application credentials appear here as authentication from unexpected IP addresses or at unexpected frequencies. Hunt campaign TH6 (privilege escalation) depends on it for post-consent behavior analysis.

AuditLogs records directory changes: user creation and deletion, group membership changes, role assignments, application consent grants, conditional access policy modifications, and MFA method registration. When you need to answer "who changed what in the directory?" or "were any roles assigned outside PIM?" this is the table. It does not record authentication events (those are in SigninLogs) or data access events (those are in CloudAppEvents).

The critical gap in this cluster: many organizations ingest SigninLogs but not AADNonInteractiveUserSignInLogs. That single omission makes the entire token replay attack surface invisible to hunting. A hunt for AiTM compromise that queries only SigninLogs produces a false negative — not because the technique is absent, but because the evidence is in a table the hunter did not query.

Cloud applications and email covers post-authentication activity — what users and applications actually do after authenticating.

CloudAppEvents is the richest table for cloud-plane hunting. It records Exchange Online operations (inbox rules, mail forwarding, message access), SharePoint and OneDrive file operations (downloads, sharing link creation, permission changes), Teams activity, Power Platform operations, and third-party SaaS activity visible to Defender for Cloud Apps. When you need to answer "what did the user do after signing in?" or "were inbox rules created?" or "were files downloaded in bulk?" this is the table. Hunt campaigns TH5 (cloud persistence), TH8 (data exfiltration), TH11 (application and API abuse), and TH13 (insider threats) all depend on it.

EmailEvents records email delivery telemetry from Defender for Office 365: delivery actions, threat detections, sender and recipient, subject, and delivery location. EmailUrlInfo and EmailAttachmentInfo provide URL and attachment details for individual emails. These tables answer "was a phishing email delivered to a user before an anomalous sign-in?" and "what emails did a compromised account send?" They do not record email content or inbox rule processing after delivery.

MicrosoftGraphActivityLogs records which applications and users called which Graph API endpoints. This table was introduced in 2024 and adoption remains low, but when enabled it dramatically enriches cloud persistence hunting — inbox rule creation via Graph, bulk data download via Graph, and permission changes via Graph all become visible.

The critical gap: CloudAppEvents requires Defender for Cloud Apps to be connected and configured. If the connector is not active, the table is empty and the entire cloud application hunting surface is dark. MicrosoftGraphActivityLogs is often not enabled — without it, API-based attack paths that bypass the M365 portal interface are invisible.

Endpoint covers device-level activity across five tables. DeviceProcessEvents records every process creation with parent process, command line, file hash, user context, and timestamp. DeviceFileEvents records file creation, modification, deletion, and rename events. DeviceRegistryEvents records registry key changes — critical for persistence detection through autostart entries, service creation, and scheduled task registration. DeviceNetworkEvents records network connections with destination IP, port, protocol, and initiating process. DeviceLogonEvents records logon events including local and remote types. These five tables collectively support TH9 (endpoint threats), TH10 (lateral movement), and TH12 (pre-ransomware activity).

The critical gap: endpoint tables see nothing in the cloud plane. An attacker operating entirely through the browser or Graph API — creating inbox rules, exfiltrating SharePoint data, modifying conditional access policies — generates zero events in any Device table. If your hunting is endpoint-only, the three dominant M365 attack categories from Section 0.5 are invisible.

KQL

// Data source audit — confirm table availability before any hunt campaign
// Run this before every new campaign to identify blind spots
union
    (SigninLogs | where TimeGenerated > ago(1d)
    | summarize Count = count() | extend Table = "SigninLogs", Cluster = "Identity"),
    (AADNonInteractiveUserSignInLogs | where TimeGenerated > ago(1d)
    | summarize Count = count() | extend Table = "AADNonInteractive", Cluster = "Identity"),
    (AuditLogs | where TimeGenerated > ago(1d)
    | summarize Count = count() | extend Table = "AuditLogs", Cluster = "Identity"),
    (CloudAppEvents | where TimeGenerated > ago(1d)
    | summarize Count = count() | extend Table = "CloudAppEvents", Cluster = "Cloud Apps"),
    (EmailEvents | where TimeGenerated > ago(1d)
    | summarize Count = count() | extend Table = "EmailEvents", Cluster = "Cloud Apps"),
    (DeviceProcessEvents | where TimeGenerated > ago(1d)
    | summarize Count = count() | extend Table = "DeviceProcessEvents", Cluster = "Endpoint")
| project Cluster, Table, Count
| order by Cluster asc, Table asc

If any table returns zero events, the corresponding hunt campaigns have a blind spot. AADNonInteractiveUserSignInLogs returning zero is the single most impactful gap — it makes the entire AiTM token replay surface invisible. CloudAppEvents returning zero means the Defender for Cloud Apps connector is likely not active, and the cloud application hunting surface is entirely dark. Run this audit before every campaign, not just the first one — connectors can fail silently, and a table that was populated last month may be empty today.

Supplementary tables

Three additional table groups extend the hunting surface beyond the three core clusters.

IdentityLogonEvents and IdentityDirectoryEvents come from Defender for Identity and provide on-premises Active Directory visibility — logon events and directory changes from domain controllers. These tables are required for hybrid hunting where cloud compromise pivots to on-premises infrastructure or vice versa. Without them, the on-premises side of lateral movement is invisible. In a hybrid Entra ID environment, an attacker who compromises a cloud account and then uses that access to authenticate to on-premises resources crosses the boundary between the identity cluster and the on-premises domain — and that crossing is visible only if Defender for Identity is deployed and these tables are ingested.

MicrosoftGraphActivityLogs record which applications and users called which Graph API endpoints, including the HTTP method, request path, and response status. This table dramatically enriches cloud persistence hunting — inbox rule creation via Graph, bulk data download via Graph, permission changes via Graph, and service principal activity all become visible. Without it, attack paths that bypass the M365 portal interface entirely are dark. An attacker who uses the Graph API to create an inbox forwarding rule, for example, generates events in this table but may not generate events in CloudAppEvents depending on the Defender for Cloud Apps configuration. The table was introduced in 2024 and adoption remains inconsistent — check whether your environment has it enabled before planning campaigns that depend on API visibility.

OfficeActivity is the legacy Office 365 audit log connector. It overlaps with CloudAppEvents but uses a different schema with less enrichment and less consistent field naming. If your environment uses OfficeActivity instead of CloudAppEvents, the hunt queries throughout this course need adaptation — all campaigns are written for the CloudAppEvents schema. Organizations still running OfficeActivity should plan migration to CloudAppEvents as a data infrastructure improvement that directly benefits hunting capability.

Figure TH0.10 — M365 hunting data source map. Each cluster covers a distinct attack plane. Common ingestion gaps noted in red.

The retention constraint

Advanced Hunting in the Defender XDR portal queries the last 30 days of data. If your hunt hypothesis covers a longer window — and long-dwell hypotheses for APT or supply chain compromises often need 90 days or more — you need to query through Sentinel's Log Analytics workspace, which respects your configured retention period, or use search jobs for archived data.

Sentinel offers tiered retention. Interactive retention (the default analytics tier) keeps data queryable for up to two years but at per-GB cost that accumulates quickly on high-volume tables like DeviceProcessEvents. Archive tier stores data at reduced cost but requires search jobs — asynchronous queries that run in the background and return results after minutes rather than seconds. For hunting, the practical question is: what retention do the hypothesis-critical tables need? Identity tables (SigninLogs, AADNonInteractiveUserSignInLogs) should have at least 90 days of interactive retention because identity compromise investigations frequently look back 60–90 days. Endpoint tables may be acceptable at 30 days if your hypotheses are primarily short-dwell. CloudAppEvents benefits from 90 days because cloud persistence mechanisms like inbox rules may operate for weeks before detection.

Before starting any campaign with a time window exceeding 30 days, confirm that the target tables have sufficient retention configured. If SigninLogs has only 30-day retention and your hypothesis is looking for a threat actor who established persistence three months ago, the evidence has already been purged. Retention configuration is a hunting prerequisite that sits alongside data ingestion — the table must be ingested, ingested completely for all relevant entities, and retained long enough for the hypothesis time window.

CLI Output

Retention check — expected output from a well-configured tenant:
Cluster        Table                           Events (24h)    Retention
─────────────  ──────────────────────────────  ──────────────  ─────────
Identity       SigninLogs                      142,381         90 days
Identity       AADNonInteractiveUserSignInLogs 1,203,447       90 days
Identity       AuditLogs                       28,912          90 days
Cloud Apps     CloudAppEvents                  487,221         90 days
Cloud Apps     EmailEvents                     53,108          30 days
Endpoint       DeviceProcessEvents             2,847,109       30 days
─────────────────────────────────────────────────────────────────────────
⚠  AADServicePrincipalSignInLogs     0 events — NOT INGESTED
⚠  MicrosoftGraphActivityLogs        0 events — NOT ENABLED
Tables with 0 events = blind spots for campaigns depending on them.
Tables with <90d retention = insufficient for long-dwell hypotheses.

The output above represents a tenant with good but incomplete coverage. The two zero-event tables create specific blind spots: service principal compromise and Graph API-based attack paths are invisible. That is not a reason to avoid hunting — it is a documented limitation to record in every hunt report that depends on those tables.

Campaign-to-table mapping

Each hunt campaign in the course depends on specific tables. Before starting any campaign, confirm the required tables are ingested and populated. The mapping is straightforward: TH4 (identity compromise) requires SigninLogs, AADNonInteractiveUserSignInLogs, and AuditLogs. TH5 (cloud persistence) requires CloudAppEvents and EmailEvents. TH6 (privilege escalation) requires AuditLogs and AADServicePrincipalSignInLogs. TH8 (data exfiltration) requires CloudAppEvents. TH9 (endpoint threats) requires DeviceProcessEvents, DeviceRegistryEvents, and DeviceFileEvents. TH10 (lateral movement) requires SigninLogs, DeviceLogonEvents, and IdentityLogonEvents. TH12 (pre-ransomware activity) requires DeviceProcessEvents, DeviceNetworkEvents, and DeviceFileEvents.

If a required table is not ingested, the campaign has two options: fix the ingestion gap before hunting (preferred, because it permanently improves the environment) or hunt with the limitation documented (acceptable, because the documented limitation is itself a finding — "this attack surface is unmonitored"). Both options produce value. Ignoring the gap and reporting a negative finding as if the data were complete is the one outcome that must not happen.

The data source audit also reveals environmental insights that inform hunt prioritization. If your tenant generates 1.2 million non-interactive sign-in events per day but only 140,000 interactive sign-in events, the non-interactive table is the larger attack surface and deserves more hunting attention. If CloudAppEvents generates five times the volume of all endpoint tables combined, your users are doing more work in the cloud than on local devices — and that is where the attack surface has shifted. Volume ratios between clusters tell you where the business operates, which tells you where the attacker operates.

Running a hunt campaign without verifying data source availability

The analyst writes a hypothesis about OAuth consent phishing, runs queries against CloudAppEvents, finds no results, and reports a clean finding. Three weeks later, incident response discovers an active OAuth compromise that has been exfiltrating SharePoint data for 40 days. The IR team reviews the hunt report and asks why it missed the compromise. The analyst checks CloudAppEvents again — the table has data now. What happened? The Defender for Cloud Apps connector was intermittently failing during the hunt window. CloudAppEvents had partial coverage — some days populated, some days empty. The hunt queried a partially populated table and interpreted sparse results as negative findings. The audit query from this section would have caught the issue: the 24-hour event count would have shown CloudAppEvents with abnormally low volume compared to typical baselines. The data source audit is not optional preparation. It is the first operational step of every campaign.

Threat Hunting Principle

A hunt is only as good as its data. Before the first query runs, confirm three things: the required tables are ingested, the tables are ingested completely for all relevant entities, and the retention window covers the hypothesis time frame. A false negative from querying the wrong table or a partially populated table is worse than no hunt at all — it creates false confidence that a threat was searched for and not found when the truth is the search was incomplete.

Section 0.11 — The Human Factor. The data sources are mapped. The next section examines the human element: what makes an effective threat hunter, how hunting skills differ from SOC analysis skills, and how to develop hunting proficiency through structured practice rather than raw experience alone.