Scenario 1. You arrive at the start of your shift and run the data pipeline health check. DeviceProcessEvents shows a DataAge of 4 hours. The incident queue has zero new incidents. Your colleague says the queue is clean and you should move straight to threat hunting. What do you do?
Agree. An empty queue means no active threats, which is a good time for proactive hunting a
An empty queue with a stale data pipeline is not a clean queue. It is a blind queue. DeviceProcessEvents has not received data for 4 hours, which means every endpoint detection rule has been silent during that window. Alerts that should have fired did not fire. The quiet queue is the symptom of the outage, not evidence of safety.
Run the noisy alert tuning query to check whether alerts were suppressed overnight b
Alert suppression rules would appear in the automation rules configuration, not in the pipeline health data. The issue here is not suppression. The issue is that DeviceProcessEvents stopped flowing 4 hours ago, which means the data that triggers endpoint alerts never reached the detection engine. No suppression rule can suppress alerts that were never generated.
Escalate the connector outage to engineering immediately and document the 4-hour gap window in the shift handover c
Correct. A 4-hour DeviceProcessEvents outage means every endpoint detection has been blind since the pipeline stopped. Normal ingestion latency is 5-15 minutes. At 4 hours, this is a connector failure. Escalate to engineering to restore the pipeline, document the gap window (so the next shift knows which period has no endpoint visibility), and after recovery, run hunting queries across the gap period to check for attacks that occurred while detection was down. Section 1.7 covers this: a silent pipeline is more dangerous than a noisy queue.
Check Threat Analytics for new campaign reports that might explain the quiet period d
Threat Analytics reports describe external threat campaigns. They do not explain internal data pipeline failures. The quiet period is caused by a connector outage, not by the absence of threats. Checking Threat Analytics is part of the shift start routine, but it does not address the pipeline problem.
Scenario 2. During triage, you encounter a High-severity incident with 6 correlated alerts across Defender for Office 365 and Entra ID. The incident name is "Multi-stage AiTM phishing campaign." You also have 3 Medium-severity incidents and 8 Low-severity incidents in the queue. You estimate the AiTM investigation will take 3-4 hours. What is the correct approach?
Investigate the AiTM incident to completion before touching any other incidents a
Jumping straight into a 3-4 hour investigation without triaging the rest of the queue is the serial investigation anti-pattern from Section 1.7. One of those Medium-severity incidents might be a second active attack. You need the 5-minute triage pass across all incidents before committing to a deep investigation.
Complete 5-minute triage on all remaining incidents first, then investigate the AiTM incident as Priority 1 b
Correct. The 5-minute triage pass across all 11 remaining incidents takes approximately 30 minutes and gives you a complete picture of the queue. A Medium-severity incident might actually be a second active attack or might correlate with the AiTM campaign. After triage, the AiTM is Priority 1 and gets your full attention. The key discipline is: triage everything first, then investigate in priority order. Section 1.7 covers this workflow.
Assign the AiTM incident to Tier 2 and work through the Medium-severity incidents yourself c
Escalating a High-severity multi-stage attack before you have triaged it is premature. You need the 5-minute triage to understand the scope, verify the automated containment actions, and determine whether Tier 2 involvement is actually needed. Additionally, the Medium-severity incidents should be triaged before investigated, not investigated before the High-severity incident is understood.
Close the Low-severity incidents as informational to reduce queue noise, then focus on the AiTM d
Closing incidents without triage is exactly the alert fatigue behavior Section 1.7 warns against. Low-severity incidents may contain early indicators of a campaign. Three low-severity alerts against the same user over several days could be the reconnaissance stage of an attack. Every incident gets a 5-minute triage, regardless of severity.
Scenario 3. You are investigating a phishing incident. EmailEvents shows a credential phishing email delivered to j.morrison. IdentityLogonEvents shows a successful sign-in for j.morrison from IP 203.0.113.47 four minutes after delivery. CloudAppEvents shows a New-InboxRule action from the same IP two minutes after the sign-in. You need to determine whether other accounts were also compromised. Which entity pivot do you use?
Query IdentityLogonEvents for all successful sign-ins from IP 203.0.113.47 during the attack window a
Correct. The attacker IP is the pivot that reveals whether other accounts were compromised from the same infrastructure. If the attacker used j.morrison's compromised mailbox to send internal phishing, other victims would authenticate through the same AiTM proxy, producing sign-ins from the same IP. This IP-based pivot across all accounts is the standard technique for scoping a credential phishing campaign. Section 1.8 covers entity pivoting in detail.
Query EmailEvents for all emails sent to j.morrison in the last 7 days b
This finds emails delivered to j.morrison, but you already know j.morrison is compromised. The question is whether other accounts were also compromised. Querying emails to the same victim does not reveal additional victims. The attacker IP pivot is the correct approach because the attacker's infrastructure is the common element across all compromised accounts.
Query CloudAppEvents for New-InboxRule actions across all users in the last 24 hours c
Querying all inbox rule creation is too broad and will return legitimate rules alongside malicious ones. The IP pivot is more precise: it finds every account accessed from the attacker's infrastructure, regardless of what post-compromise actions the attacker took. Some compromised accounts may not have inbox rules but may have file downloads or OAuth consent grants instead.
Query DeviceProcessEvents for j.morrison's device to check for endpoint compromise d
Checking the endpoint is a valid investigation step for j.morrison specifically, but it does not answer the question of whether other accounts were compromised. AiTM credential phishing typically does not involve endpoint malware. The attacker signs in through the cloud, not through the victim's device. The IP pivot across IdentityLogonEvents is the correct approach for scoping the campaign.
Scenario 4. A Defender for Endpoint alert fires on DESKTOP-NGE042: "Suspicious process tree — PowerShell spawned from Word macro." The device timeline shows Word launched PowerShell, which executed an encoded command that downloaded a file from an external URL. You need to respond. What is the correct sequence?
Isolate the device immediately, then review the process tree to understand the attack a
Isolating before collecting the investigation package risks losing volatile data. Isolation cuts the device off from the network, which may terminate running processes and connections that provide investigation context. The correct order is collect first, then isolate. Section 1.4 covers this sequence.
Run a live response session to delete the downloaded file and kill the PowerShell process b
Deleting evidence and killing processes before collection destroys forensic data. You need the downloaded file for hash analysis, the running process for memory analysis, and the network connections for C2 infrastructure identification. Live response is for collection, not immediate remediation of active processes.
Check whether attack disruption already contained the device before taking manual action c
Checking automated actions is part of the triage workflow, but it is not the first response action. Whether or not disruption triggered, you still need the investigation package collected before isolation. If disruption already isolated the device, you may need to briefly reconnect to collect the package, depending on timing.
Collect the investigation package first, then isolate the device, then investigate the process tree and downloaded file d
Correct. Collect before isolate. The investigation package captures running processes, network connections, scheduled tasks, and other volatile data that isolation may disrupt. After the package is collected, isolate the device to prevent further C2 communication and lateral movement. Then investigate: analyze the process tree, decode the PowerShell command, check the downloaded file hash across the tenant, and determine blast radius. Section 1.4 covers this exact response sequence.
Scenario 5. Defender for Identity generates an alert: "Suspected Kerberoasting activity" on an account that requested service tickets for 47 SPNs in 90 seconds. The account belongs to a junior developer. Your on-premises AD team says the developer was running a legitimate inventory script. How do you classify this incident?
True Positive — the behavior matches Kerberoasting regardless of the stated purpose a
True Positive means a real threat requiring investigation and response. If the AD team has confirmed this is a legitimate inventory script and you can verify the business justification, classifying as TP would initiate unnecessary containment actions against a legitimate user performing authorized work.
Benign True Positive — the detection is correct (bulk SPN requests occurred) but the activity is authorized b
Correct. Benign True Positive means the detection rule worked correctly and the behavior genuinely occurred, but it was authorized business activity. The developer really did request 47 service tickets in 90 seconds, which matches the Kerberoasting detection pattern. The key is documenting the justification: reference the change ticket or business approval, note the script name and schedule, and consider creating a suppression rule for this specific account-and-script combination so the same legitimate activity does not generate alerts on future runs. Section 1.2 covers the classification framework.
False Positive — the detection rule is wrong and should be tuned c
False Positive means the detection rule fired incorrectly. In this case, the rule fired correctly: bulk SPN requests in 90 seconds genuinely occurred and genuinely match the Kerberoasting detection pattern. The rule is not wrong. The activity is just authorized. The distinction matters for tuning decisions: you would not disable the Kerberoasting detection rule, but you might create a suppression for this specific account.
Unknown — escalate to Tier 2 for deeper analysis before classifying d
If the AD team has confirmed the business justification and you can verify the script, there is no ambiguity to escalate. Unknown is appropriate when 5 minutes of triage does not provide enough evidence to classify. Here, the evidence is clear: the detection is accurate and the activity is authorized.
Scenario 6. You are writing the shift handover at 17:00. You investigated INC-4821 (AiTM phishing) for 6 hours. Containment is complete on 3 compromised accounts. You identified that the attacker created inbox rules on all 3 accounts but you have not yet checked whether files were downloaded from SharePoint. A password reset for one account is pending manager approval. What must the handover include?
A summary stating containment is complete and the incident can be closed on the next shift a
The incident cannot be closed. You have not checked SharePoint file access, which means the scoping is incomplete. A password reset is still pending. Telling the next analyst the incident is ready for closure would skip critical investigation and remediation steps.
The incident number and a note to continue investigating tomorrow b
This forces the night analyst to reconstruct 6 hours of investigation context from scratch. They would need to re-read every alert, re-examine every entity, and figure out where you stopped. This is the documentation failure Section 1.7 describes: if the next analyst needs to message you to understand the investigation state, your handover failed.
Open incidents with current state, specific next steps (check SharePoint access for all 3 accounts), pending actions with deadlines (password reset approval), and any environmental notes c
Correct. The handover must specify: the incident number and current state (3 accounts contained, inbox rules identified on all 3), the specific next investigation step (run CloudAppEvents query to check file download activity for all 3 accounts during the compromise window), pending actions with deadlines (password reset for the specific account, pending manager approval, escalate if not approved by a specific time), and any environmental context the night shift needs. Section 1.7 covers the three-area handover structure.
A detailed narrative of every investigation step you performed over 6 hours d
The handover is a 2-minute briefing, not a 6-hour retrospective. The incoming analyst needs to know the current state and what to do next, not a chronological replay of your investigation process. Detailed investigation notes belong in the incident comments. The handover should be 5-10 bullet points covering open incidents, pending actions, and environmental issues.
Scenario 7. You have two separate incidents in the queue. Incident A (Monday) contains a Defender for Endpoint alert: credential dumping tool execution on DESKTOP-NGE042. Incident B (Thursday) contains a Defender for Cloud Apps alert: anomalous file downloads from SharePoint by admin.t.clark. The incidents are not automatically correlated. How do you determine whether they are related?
Check whether admin.t.clark ever logged on to DESKTOP-NGE042 by querying DeviceLogonEvents, because the credential dump may have captured cached credentials for that account a
Correct. The connecting hypothesis is: the credential dumping tool on DESKTOP-NGE042 extracted cached credentials for admin.t.clark, and the attacker used those stolen credentials on Thursday to access SharePoint. Query DeviceLogonEvents for admin.t.clark logons to DESKTOP-NGE042 in the weeks before the credential dump. If admin.t.clark had cached credentials on that device, the two incidents are almost certainly stages of the same multi-day attack. Section 1.8 covers this exact failure pattern: time gaps between stages cause the correlation engine to create separate incidents.
They are clearly unrelated because they are 3 days apart and involve different products b
Time gaps and product boundaries are exactly where automated correlation fails. Section 1.8 covers this: sophisticated attackers deliberately pause between credential theft and data exfiltration. A 3-day gap is a common attack pattern, not evidence that the incidents are unrelated. Cross-product correlation is specifically designed to connect incidents that span different products and different time windows.
Merge the incidents in the portal and let the automated investigation re-analyze them together c
Merging incidents combines them for case management but does not re-run the correlation analysis. You still need to manually determine whether the incidents are related through entity pivoting. Merging before confirming the connection creates a combined incident with no established relationship between the alerts.
Query AlertEvidence for shared file hashes between the two incidents d
File hash pivoting connects email delivery to endpoint execution (same malware hash). In this scenario, the connection between credential dumping and SharePoint exfiltration is not through a shared file. The connecting entity is the user account (admin.t.clark's cached credentials on the compromised device). Different attack patterns use different entity pivots.
Scenario 8. Your SOC has been running for 6 months. The incident queue averages 500 alerts per week. Your team's analysis shows a 92% false positive rate across all alert types. The top 3 noisiest rules account for 340 of those alerts. A senior analyst proposes disabling those 3 rules entirely to reduce workload. What is the better approach?
Disable the rules. 92% false positive rate proves they are not working a
A high false positive rate does not mean the rules have zero detection value. Those 340 alerts include 8% true positives, which means approximately 27 real threats per week are detected by those rules. Disabling them eliminates the noise but also eliminates those 27 detections. The goal is to reduce the false positives while preserving the true positive detections.
Accept the noise as unavoidable in a security operations environment b
Accepting a 92% false positive rate is accepting that your analysts spend 92% of their triage time on alerts that are not real threats. This is the alert fatigue that Section 1.7 describes: analysts start closing incidents without reading the evidence, classifying ambiguous alerts as FP because most alerts are. The noise is not unavoidable. It is reducible through systematic tuning.
Hire additional analysts to handle the alert volume c
Adding analysts to process 460 false positives per week treats the symptom instead of the cause. The cost of additional analysts processing noise far exceeds the cost of tuning the rules to reduce noise. Tuning transforms the queue: 500 alerts at 92% FP (40 real) becomes 200 alerts at 80% FP (still 40 real, but in far less noise).
Analyze the false positive patterns in those 3 rules, create targeted suppression rules for the known-good activity, and preserve detection for the malicious variants d
Correct. The goal is surgical tuning, not rule deletion. Analyze what makes the false positives false: specific scripts, specific service accounts, specific scheduled tasks, specific device groups. Create suppression rules that exclude the known-good patterns while keeping the detection active for unknown patterns. If "Suspicious PowerShell command line" fires 187 times on admin.t.clark's compliance script, suppress that specific combination. The rule continues detecting suspicious PowerShell from every other account and every other command line. Section 1.7 covers this approach with the noisy alert tuning query.
💬
How was this module?
Your feedback helps us improve the course. One click is enough — comments are optional.