In this section

3.2 Cloud Evidence Auto-Collection

5 hours · Module 3

What you already know

Section 3.1 established the evidence decay problem and the four-priority collection framework. Cloud session and activity state is Priority 1, collect within 5 minutes. You know the two-wave architecture and why collection runs before containment. This section builds the four KQL queries that compose the cloud evidence package, optimizes them for the 30-second evidence window, and formats the results as structured incident comments.

Scenario

Tom opens an AiTM incident for d.chen@northgateeng.com. The investigation starts the same way every identity investigation starts: query SigninLogs for authentication history, query AuditLogs for configuration changes, query OfficeActivity for data access, query the mailbox audit for email-specific actions. Four queries, four tabs, 10 to 15 minutes before Tom has the evidence he needs. He does this on every identity incident. Every analyst in the SOC does this on every identity incident. The queries are identical across analysts, across shifts, across weeks. Only the entity values change.

What the cloud collection playbook replaces

The manual evidence collection sequence is a predictable series of queries: sign-in history for the user over the past 24 hours, audit log changes for the same period, M365 activity over 7 days, and mailbox-specific operations. Each query takes one to three minutes to write, execute, and review. The full sequence takes 10 to 15 minutes for a straightforward identity alert, longer for multi-entity incidents.

The cloud collection playbook automates the entire sequence. Four KQL queries run in parallel through a Logic App's Parallel branch pattern. The results are formatted as a structured incident comment with anomaly highlighting and attached before the analyst opens the incident. Total time from incident creation to cloud evidence attached: 8 to 15 seconds.

The sign-in history is the foundation of every identity incident investigation. It answers: when did the attacker sign in, from where, using what device, with what MFA claim, and to which application.

SigninLogs
| where TimeGenerated > ago(24h)
| where UserPrincipalName =~ "{CompromisedUPN}"
| project
    TimeGenerated,
    IPAddress,
    Location = strcat(LocationDetails.city, ", ", LocationDetails.countryOrRegion),
    AppDisplayName,
    ClientAppUsed,
    ResultType,
    ResultDescription,
    DeviceDetail_OS = tostring(DeviceDetail.operatingSystem),
    DeviceDetail_Browser = tostring(DeviceDetail.browser),
    MFADetail = tostring(MFADetail.authMethod),
    ConditionalAccessStatus,
    RiskLevelDuringSignIn,
    IsInteractive
| order by TimeGenerated desc

The query captures all sign-ins (successful and failed) for the compromised user in the last 24 hours. This window captures the attacker's sign-in events plus a baseline of the user's normal sign-in pattern for comparison. The investigator scans the results looking for: sign-ins from unfamiliar IPs, sign-ins with unusual user agents (Python requests, PowerShell, Invoke-WebRequest), sign-ins that satisfied MFA with a newly registered method, and sign-ins to applications the user does not normally access.

The 24-hour window is sufficient for most AiTM incidents: the attacker typically gains access and acts within hours. For suspected slow-burn compromises, the investigation team extends the window manually. The auto-collection provides the initial 24-hour snapshot for rapid assessment.

In the Logic App, this query executes via the "Run query and list results" Sentinel connector action. The action returns a JSON array. The playbook formats the results into a structured incident comment with a summary line ("42 sign-in events in the last 24h: 38 successful, 4 failed, 3 unique IPs") followed by the raw results in a detail section.

Query 2: Audit log changes (24 hours)

The audit log captures every configuration change made to or by the compromised user: MFA method registration, password changes, group membership modifications, role assignments, OAuth consent grants, application creation, and conditional access policy modifications.

AuditLogs
| where TimeGenerated > ago(24h)
| where TargetResources[0].userPrincipalName =~ "{CompromisedUPN}"
    or InitiatedBy.user.userPrincipalName =~ "{CompromisedUPN}"
| project
    TimeGenerated,
    OperationName,
    InitiatedByUPN = tostring(InitiatedBy.user.userPrincipalName),
    InitiatedByIP = tostring(InitiatedBy.user.ipAddress),
    TargetUPN = tostring(TargetResources[0].userPrincipalName),
    ModifiedProperties = tostring(TargetResources[0].modifiedProperties),
    Result
| order by TimeGenerated desc

The dual filter (TargetResources OR InitiatedBy) captures both changes made to the user (admin reset their password, someone added them to a group) and changes made by the user (registered MFA, consented to an app). During a compromise, the attacker acts as the user, so "changes made by the user" during the compromise window are the attacker's actions.

Key audit events the investigator looks for: "User registered security info" (attacker registered MFA), "Consent to application" (attacker consented to malicious OAuth app), "Add member to role" (attacker escalated privileges), "Update user" (attacker changed properties), "Set-InboxRule" or "New-InboxRule" (attacker created inbox rules). The auto-collection captures all audit events and lets the investigator filter during analysis.

Query 3: M365 activity (7 days)

OfficeActivity records every action the user took across M365 services: emails read, files accessed in SharePoint and OneDrive, Teams messages sent. This is the data access evidence, what did the attacker actually do in the compromised account?

OfficeActivity
| where TimeGenerated > ago(7d)
| where UserId =~ "{CompromisedUPN}"
| project
    TimeGenerated,
    Operation,
    Workload,
    ClientIP,
    ResultStatus,
    ItemName = case(
        Workload == "Exchange", OfficeObjectId,
        Workload == "SharePoint", SourceFileName,
        Workload == "OneDrive", SourceFileName,
        ""),
    SiteUrl
| order by TimeGenerated desc

The 7-day window is longer than the sign-in and audit queries because slow-burn compromises are common. The attacker may have had access since last Tuesday but only started exfiltrating data yesterday. The 7-day window captures the entire operational period.

Key operations the investigator looks for: "MailItemsAccessed" (attacker read specific emails, critical for BEC investigation), "FileDownloaded" or "FileAccessed" (attacker accessed SharePoint/OneDrive files), "FileSyncDownloadedFull" (attacker synced an entire library), and "AddMailboxPermission" (delegate access).

Anti-Pattern

Treating auto-collected evidence as less reliable than manual collection

The admissibility of evidence depends on integrity and chain of custody, not on whether a human or an automation collected it. Auto-collected evidence has stronger integrity guarantees than manual collection: the exact query is logged in the Logic App run history, the timestamp is machine-generated, and the results are attached to the incident without human modification. Manual collection is more prone to errors: the analyst may run the wrong time range, forget a table, or modify results during formatting. Auto-collection is repeatable, documented, and tamper-evident.

Query 4: Mailbox audit (7 days)

For email-focused incidents, the mailbox audit provides granular email operations that OfficeActivity may not capture at the same detail level. If mailbox audit data is ingested into Sentinel via the Exchange Online data connector:

ExchangeOnlineAuditLogs
| where TimeGenerated > ago(7d)
| where UserId =~ "{CompromisedUPN}"
| where Operations has_any ("MailItemsAccessed", "Send", "SendAs",
    "SendOnBehalf", "Create", "MoveToDeletedItems", "SoftDelete",
    "UpdateInboxRules", "AddFolderPermissions", "AddMailboxPermission")
| project TimeGenerated, Operations, ClientIP, MailboxOwner, LogonType
| order by TimeGenerated desc

If mailbox audit data is not in Sentinel, the collection playbook uses Exchange Online PowerShell via an Azure Function (covered in SA10): Search-UnifiedAuditLog with the RecordType, UserIds, and date range parameters. The Azure Function returns the results to the Logic App, which formats them into the incident comment.

Formatting the evidence package

The four query results are formatted into a single structured incident comment. The summary highlights anomalous events: the investigator reads the summary for the quick assessment, then dives into the full results for detailed analysis.

Incident Comment

EVIDENCE COLLECTION. Cloud Evidence Package

Collected: 2026-04-08 14:31:05 UTC (93 seconds after alert)
Target: d.chen@northgateeng.com
Collection scope: 24h sign-ins, 24h audit, 7d M365 activity, 7d mailbox

SIGN-IN HISTORY (24h): 42 events (38 success, 4 failed), 3 unique IPs. Anomalous: 6 sign-ins from 203.0.113.45 (Amsterdam) between 14:28-14:33. MFA satisfied by claim.

AUDIT CHANGES (24h): 8 events. Anomalous: "User registered security info" at 14:30 from 203.0.113.45. "Consent to application" (Azure Productivity Suite) at 14:35. "Set-InboxRule" (Forward All) at 14:32.

M365 ACTIVITY (7d): 312 events. Anomalous: 47 MailItemsAccessed from 203.0.113.45 between 14:28-14:33. 3 FileDownloaded from SharePoint /sites/Finance.

MAILBOX AUDIT (7d): 89 events. Anomalous: SendAs operations from 203.0.113.45 (attacker sent emails as d.chen).

The full results are attached as a second incident comment (or stored in Azure Blob) due to comment length limits. The cloud collection playbook runs four KQL queries, each returning up to 500 rows. For a user with heavy email activity, the OfficeActivity query alone may return thousands of rows. Two approaches handle large result sets: store the full results in Azure Blob Storage with a download link in the incident comment (preferred for evidentiary completeness), or truncate to the most recent 500 events with a note indicating the total count and storage location.

Cloud collection query optimization

The four cloud collection queries must execute within the 30-second evidence window. Three optimization patterns make this possible.

Time filter first, always. Every query includes TimeGenerated > ago(24h) as the first condition, limiting the scan to the most recent partition and dramatically reducing query cost. For deep collection mode (triggered manually or by the specialized playbooks in Sections 3.7 through 3.9), the filter expands to ago(7d) or ago(30d), but the default collection uses the 24-hour window for speed.

Entity filter second. Every query filters on the compromised entity immediately after the time filter: | where UserPrincipalName =~ "{UPN}". Without the entity filter, the query scans all sign-ins for all users: a 100x performance difference at NE's scale (810 users, 24 hours of activity). The =~ operator performs case-insensitive matching, which matters because UPN case varies across data sources.

Project last. Every query uses | project to select only relevant columns. The full SigninLogs table has 80+ columns; projecting the 15 needed reduces data transfer by 80%. Each query includes | take 500 as a safety limit, if the compromised user has 5,000 events in 24 hours, the query returns the 500 most recent rather than timing out. The evidence comment notes the truncation: "500 events captured (more may exist, extend if needed)."

These optimizations ensure the four cloud queries complete in 8 to 15 seconds total when running in parallel. The SA9 monitoring dashboard tracks query execution time per collection run, flagging any query that consistently approaches the 30-second budget.

The cloud collection queries produce structured output that downstream automation can consume. Each query result is formatted as a JSON array with consistent field naming: timestamp, entity, action, source_ip, result, and details. This standardization means the SA7 containment playbook can extract entity lists for proactive containment scoping, and the SA13 capstone metrics can calculate evidence completeness without custom parsing per query type.

The query results are attached to the Sentinel incident as structured comments with a machine-readable header: . The HTML comment is invisible to human readers but parseable by downstream playbooks. This pattern, human-readable content with machine-readable metadata, enables the automation stack to chain collection, enrichment, and containment without human intervention at each handoff.

For queries that return zero results, the comment explicitly states that the query returned no results and offers diagnostic guidance: the user may not have signed in recently, the UPN may differ from expected (check for aliases), or the data connector may have a delay. Zero results with diagnostic guidance prevents the analyst from assuming the collection failed when the data simply does not exist.

← Previous Next →