In this section

3.3 Endpoint Evidence Auto-Collection

5 hours · Module 3

What you already know

Section 3.2 built the cloud evidence collection: four KQL queries capturing sign-in history, audit changes, M365 activity, and mailbox operations. You know how to run queries in parallel, format structured evidence comments, and handle large result sets. This section builds the endpoint evidence collection: MDE investigation packages and four KQL queries against the MDE tables, with the critical timing constraint that collection must fire before containment isolates the device.

Scenario

A Defender for Endpoint alert fires on DESKTOP-NGE042: suspicious process execution. PowerShell launched by wmiprvse.exe with an encoded command line, followed by certutil downloading a binary from an external IP. The attacker's tools are running right now. In 15 minutes, containment will isolate the device, the attacker's processes will lose network connectivity, and the C2 beacon will terminate. The investigation package captures the live state, processes, connections, persistence, but it takes 10 minutes to generate. If you do not trigger the package now, the processes that prove the attack will be gone before the package starts collecting.

The MDE investigation package

The primary endpoint collection mechanism is the MDE investigation package. The Logic App triggers the package via the MDE API, and it captures a comprehensive snapshot of the device's state: running processes with command lines, network connections with remote IPs, scheduled tasks and services, autostart entries, DNS cache, running drivers, and event log excerpts.

POST https://api.securitycenter.microsoft.com/api/machines/{machineId}/collectInvestigationPackage
Content-Type: application/json
{
  "Comment": "Auto-collected by SA3 playbook — Incident #{incidentNumber}"
}

The managed identity requires Machine.CollectForensics on its MDE RBAC role. The package takes 5 to 15 minutes to generate, depending on device activity and disk speed. The API returns immediately with 202 Accepted and an action ID. The playbook polls every 30 seconds via GET /api/machineactions/{actionId} until the status changes to "Succeeded" (with a download URL) or "Failed."

The investigation package trigger is idempotent. If SA3 triggers the package for evidence collection and SA6 also triggers the package for pre-isolation collection, the second trigger returns the existing action ID. The MDE API does not create a second package, it returns the in-progress one.

The package must be triggered before isolation (SA6) but completes after isolation. The MDE cloud channel survives full network isolation: the sensor uploads via its hardcoded Microsoft endpoint allow list. The 30-second evidence head start from the two-wave architecture ensures the trigger fires before isolation propagates.

KQL queries for immediate evidence

The investigation package takes minutes. KQL returns in seconds. The playbook runs four targeted KQL queries against the MDE tables in parallel with the package trigger, giving the investigator immediate evidence while the comprehensive package generates.

// Process creation events — process tree showing attacker tool execution
DeviceProcessEvents
| where DeviceId == "{DeviceId}"
| where TimeGenerated > ago(24h)
| project TimeGenerated, FileName, ProcessCommandLine,
    InitiatingProcessFileName, InitiatingProcessCommandLine,
    ProcessId, InitiatingProcessId, AccountName, AccountDomain,
    SHA256, FolderPath
| order by TimeGenerated desc

The process query returns the full process history: every executable that launched on the device in 24 hours. The investigator filters for: unsigned executables (SHA256 checked against known-good hashes), processes launched from suspicious paths (C:\Users\Public, C:\ProgramData, temp directories), processes with encoded command lines (base64 in PowerShell -enc parameter), and processes initiated by unexpected parent processes (cmd.exe spawned by Excel, PowerShell spawned by wmiprvse.exe).

// Network connections — C2 communication and lateral movement
DeviceNetworkEvents
| where DeviceId == "{DeviceId}"
| where TimeGenerated > ago(24h)
| project TimeGenerated, RemoteIP, RemotePort, RemoteUrl,
    LocalPort, Protocol, ActionType,
    InitiatingProcessFileName, InitiatingProcessCommandLine
| order by TimeGenerated desc

The network query captures C2 communication (persistent outbound connections to the attacker's infrastructure), lateral movement (connections on SMB/445, WinRM/5985, RDP/3389 to other internal hosts), and data exfiltration indicators (large outbound transfers). Cross-reference remote IPs against the TI indicator table from SA2.6 for campaign context.

The remaining two queries. DeviceFileEvents filtered to file creation and modification events, and DeviceRegistryEvents filtered to high-value persistence paths (Run keys, service registration, Defender exclusions), run in the same parallel branch. The file query captures tools dropped by the attacker and files staged for exfiltration. The registry query captures persistence mechanisms and defense evasion (Defender exclusion additions).

// Registry events — persistence and defense evasion indicators
DeviceRegistryEvents
| where DeviceId == "{DeviceId}"
| where TimeGenerated > ago(24h)
| where RegistryKey has_any (
    @"SOFTWARE\Microsoft\Windows\CurrentVersion\Run",
    @"SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce",
    @"SYSTEM\CurrentControlSet\Services",
    @"SOFTWARE\Microsoft\Windows Defender\Exclusions",
    @"SOFTWARE\Policies\Microsoft\Windows Defender")
| project TimeGenerated, RegistryKey, RegistryValueName,
    RegistryValueData, ActionType,
    InitiatingProcessFileName, InitiatingProcessCommandLine
| order by TimeGenerated desc

The registry query filters to specific high-value paths rather than returning the full DeviceRegistryEvents table. A typical workstation generates thousands of benign registry modifications per hour; filtering to persistence and defense evasion paths reduces noise while capturing the artifacts that matter for investigation.

Device-type-aware collection

Not all devices justify the same collection depth. The playbook checks the device type before determining the collection scope.

Workstations (DESKTOP-NGE-xxx) receive full collection: all four KQL queries plus the investigation package. Workstations are the most common initial access point and the evidence is rarely voluminous enough to exceed query limits.

Servers (SRV-NGE-xxx) receive full collection with extended time ranges — 48 hours instead of 24 for process and network events. Servers have higher event volumes but also longer attacker dwell times. The 24-hour window may miss the initial access if the attacker has been present for days.

Domain controllers (SRV-NGE-DC-xxx) receive full collection plus additional queries for AD-specific evidence: DeviceProcessEvents filtered for DCSync indicators (lsass.exe with replication-related command lines), Kerberos ticket operations, and ntdsutil.exe execution. Domain controller compromise is the highest-severity endpoint finding.

The device type is determined from the device name pattern (which follows NE's naming convention) or from the MDE device properties (DeviceCategory field in the machines API response).

Anti-Pattern

Relying solely on the investigation package for endpoint evidence

The investigation package is comprehensive but takes 10 to 15 minutes. The four KQL queries return in seconds. If you design the pipeline around the package alone, the investigator waits 15 minutes for evidence that KQL could have provided immediately. Run the four KQL queries in parallel with the package trigger. The investigator gets immediate process, network, file, and registry evidence within 30 seconds, plus the comprehensive package 10 minutes later. The KQL results also serve as a fallback, if the package fails (device offline, sensor unreachable, disk full), the four KQL queries still return cloud-stored telemetry. The package is complementary, not the sole collection mechanism.

Formatting the endpoint evidence package

All four KQL queries run in parallel and the results are formatted into the evidence comment alongside the investigation package download link.

Incident Comment

ENDPOINT EVIDENCE. DESKTOP-NGE042

Collected: 2026-04-08 14:31:22 UTC
Device: DESKTOP-NGE042 | Windows 11 23H2 | MDE Active
Investigation Package: Triggered at 14:31, status: COLLECTING

PROCESS EVENTS (24h): 847 events. powershell.exe -enc [base64] launched by wmiprvse.exe at 14:29. certutil.exe -urlcache -f from external IP at 14:30. payload.exe (unsigned) from C:\Users\Public\ at 14:30.

NETWORK EVENTS (24h): 312 events. 23 connections to 203.0.113.45:443 (attacker IP) between 14:28-14:33. SMB (445) to SRV-NGE-FS01 at 14:31, lateral movement indicator.

FILE EVENTS (24h): 156 events. C:\Users\Public\payload.exe created at 14:30. C:\Users\d.chen\AppData\Local\Temp\data.zip created at 14:32 (12MB).

REGISTRY EVENTS (24h): 4 events in monitored paths. Run key "UpdateCheck" added at 14:31.

The investigator receives immediate KQL results within 30 seconds of the alert, plus the comprehensive investigation package available 5 to 15 minutes later. The package download URL is appended to the incident comment when the polling loop detects completion.

Collection-before-containment timing

The timing coordination between collection and containment is the critical design constraint for endpoint evidence. When containment (SA6) isolates the device, the attacker's processes lose network connectivity and C2 beacons terminate. The investigation package captures the live state before isolation. The playbook's execution order ensures collection fires first (order = 1 in the automation rule), containment fires second (order = 2).

If the timing cannot be guaranteed because both playbooks trigger simultaneously, the collection playbook adds a 30-second head start before the containment playbook's isolation action. This is enough time to trigger the investigation package and fire the four KQL queries, though the package itself completes after isolation. The MDE cloud channel survives full network isolation: the sensor uploads via its hardcoded Microsoft endpoint allow list.

Investigation package lifecycle

The MDE investigation package is an asynchronous operation. The API call at 14:31 triggers collection, but the package is not ready until approximately 14:41. The playbook polls for completion using a "Do Until" loop, checking the action status every 60 seconds. When status changes from "Pending" to "Succeeded," the package is ready and the playbook retrieves the download URL and stores it in the incident comment with the SHA-256 hash.

The trigger is idempotent. Sending a second trigger while the first is generating does not create a second package: the MDE API returns the existing action ID. This matters for the cross-environment playbook: if SA3 triggers the package for evidence collection and SA6 also triggers it for pre-isolation collection, the second trigger does not interfere with the first.

Package size varies significantly: a clean workstation produces a 30 to 50 MB package, while a server with extensive event logs and many running services produces 150 to 300 MB. The playbook does not download and re-upload the package (too large for incident comments). The IR team downloads directly from the MDE-provided URL.

Troubleshooting endpoint collection failures

Investigation package fails with "Device not available": the device is offline or the MDE sensor has lost connectivity. The package requires the sensor to be active. The four KQL queries still return results because they query cloud-stored telemetry, not the device directly.

DeviceProcessEvents returns zero results: the device may not be onboarded to MDE, or the advanced hunting data has not synced. Check the device's healthStatus via the MDE API. If "InActive" or "NoCommunication," endpoint evidence is unavailable. Document this explicitly in the evidence package.

Investigation package exceeds 15 minutes: large devices (servers with high event volumes) produce larger packages. The polling loop extends to 30 minutes before timing out. After 30 minutes, the playbook logs a timeout and adds a manual task for the investigator. The KQL evidence is unaffected by package delays.

Package generation failures occur approximately 2% of the time, primarily from devices that are offline or have insufficient disk space. The playbook handles failures gracefully: if the package fails, it falls back to live response collection which captures running processes, network connections, and persistence mechanisms through the MDE cloud channel. The fallback is not as comprehensive as the full package (it misses prefetch, registry hives, and event log exports) but captures the highest-priority evidence.

← Previous Next →