In this section

Analysis: Separating Signal from Noise

4-5 hours · Module 1 · Free
What you already know
Section 1.3 taught iterative collection: the four-step funnel that narrows hundreds of thousands of events to a manageable set of anomalies. You have 28 users with sign-ins from IPs outside their 30-day baseline, reduced to 3 with correlated MFA registration signals. This section teaches the analytical judgment that determines which of those 3 are compromised and which traveled to a conference last week.

Scenario

Three NE users have sign-ins from new IPs combined with new MFA method registrations in the same 24-hour window. Tom Ashworth opens a ticket for all three and labels them "probable compromise." Rachel Okafor, the CISO, asks: did you verify whether any of them were traveling? Tom did not. Two of the three were at an industry conference in Munich. The third registered a new phone after their device was replaced by IT. All three were legitimate. Tom's hunt produced three false escalations, wasted 6 hours of IR time, and eroded trust in the hunting program before it produced a single real finding.

Analysis is contextual judgment

Your collection queries produced anomalies. Some are threats. Most are not. A user traveling to a conference and registering a new phone produces exactly the same telemetry as an AiTM attacker who registered a new MFA method from a stolen session. The data looks identical. The context is different.

Analysis is the process of adding context until you can make a defensible judgment. Not a guess, not an instinct, not "this looks suspicious." A judgment backed by evidence, documented with reasoning, and defensible when the CISO asks why you escalated.

Five enrichment dimensions

When a hunt result looks suspicious, enrich it across five dimensions before making a judgment. Each dimension either strengthens the suspicion or explains it away. You need at least three dimensions pointing the same direction before escalation is justified.

Dimension 1: User context. Who is this person? What role do they hold? Do they travel regularly? Are they in IT (where unusual system access is expected) or in finance (where it is not)? A Global Admin signing in from Munich during a known conference is different from a finance clerk signing in from Romania with no travel history. Check Entra ID for department, job title, and manager. Check recent calendar entries if accessible.

Dimension 2: Temporal context. When did the anomaly occur? Business hours in the user's time zone, or 03:00 local time from someone who has never signed in outside 08:00 to 18:00? Timing matters because attackers often operate outside the victim's normal hours. Cross-reference with email delivery data: was a phishing email delivered to this user in the 48 hours before the anomalous sign-in? That single correlation is one of the strongest indicators of compromise in M365 environments. If EmailEvents shows a high-confidence phishing email delivered to the user 6 hours before the anomalous sign-in, the temporal correlation between delivery and compromise elevates the finding from "suspicious" to "probable."

Attackers also exhibit temporal clustering. A compromised account used for BEC typically shows rapid-fire activity within a 2-to-4-hour window: sign-in, inbox rule creation, email forwarding configuration, and data access all occur in quick succession. Legitimate users spread activity across hours and days. When you see five distinct high-risk actions compressed into 90 minutes, that compression is itself a signal.

Dimension 3: Geographic context. Where is the anomalous IP? Is it in a country the user has never authenticated from? Is it a known VPN egress point, a residential proxy (common in AiTM toolkits), or a datacenter IP belonging to a hosting provider? Check the IP against your known corporate VPN ranges. If only one user has ever authenticated from that IP, the signal is strong. If 40 users authenticate from it, it is corporate infrastructure.

The geographic enrichment query is simple but revealing. Query 90 days of SigninLogs for the suspect IP address and count how many distinct users authenticated from it, when it first appeared, and when it was last seen. An IP that appeared for the first time today and was used by a single user is far more suspicious than one that has served 50 users over 6 months. For residential proxy IPs (common in AiTM toolkits like EvilGinx and Modlishka), the ASN often belongs to networks like Bright Data, Luminati, or SmartProxy. These are legitimate proxy services, but their use for M365 authentication is almost never legitimate in an enterprise context. The ASN alone does not confirm compromise, but it shifts the geographic dimension firmly toward suspicion.

"Impossible travel" is another geographic signal. If the user signed in from London at 14:00 and from Singapore at 14:30, the geographic distance is physically impossible in the elapsed time. Entra ID Protection detects some impossible travel scenarios, but its thresholds are generous (allowing for VPN use and cached tokens). A hunt query can apply a stricter threshold: any two successful sign-ins from the same user, from countries more than 3,000 km apart, within 2 hours, where neither IP belongs to a known VPN range.

Dimension 4: Behavioral context. What did the user do during and after the anomalous session? Normal work activity (reading email, editing documents, attending Teams meetings) is consistent with a legitimate user on a new device. Unusual activity (creating inbox rules, consenting to OAuth applications, downloading files from SharePoint sites the user never normally accesses, sending bulk emails to new external recipients) is consistent with attacker post-compromise behavior. This is the most powerful dimension because it examines what the session was used for rather than how it was established.

Dimension 5: Infrastructure context. What is the reputation of the IP, domain, or user agent? Check against Defender Threat Intelligence, community reputation feeds, or simple ASN lookups. An IP belonging to a known residential proxy network (common in AiTM infrastructure) carries more weight than one belonging to a major ISP. A user agent string for a browser version that does not match the user's managed device is a signal, though not conclusive on its own.

FIVE ENRICHMENT DIMENSIONS User Role, travel, dept Who is this person? Temporal Time of day, phishing When did it happen? Geographic Country, proxy, VPN Where is the IP? Behavioral Post-signin actions What did they do? Infrastructure IP rep, ASN, UA What is the source? Three or more dimensions pointing the same direction = justified escalation Each dimension independently either strengthens the suspicion or explains it away. No single dimension is conclusive. The combination creates the analytical judgment.

Five enrichment dimensions. Enrich across all five before making a judgment. No single dimension is conclusive. The combination creates the analytical judgment that distinguishes compromise from conference travel.

Behavioral baselining: defining "normal" per entity

Many hunt campaigns depend on comparing current behavior against a baseline of "normal." If the baseline is wrong, the hunt either misses real anomalies (baseline too broad) or flags every legitimate change as a threat (baseline too narrow).

Baselines must be per-entity, not global. A global baseline ("the average user signs in from 2.3 unique IPs per week") obscures the individual patterns that make anomaly detection work. The SOC analyst who uses VPN from three countries while traveling has a different normal than the accountant who signs in from the same office every day. A global threshold catches the accountant's first new IP but misses the traveler's fifth.

KQL
// Per-user behavioral baseline: IP, country, device, app
let baselineStart = ago(37d);
let baselineEnd = ago(7d);
let baseline = SigninLogs
| where TimeGenerated between (baselineStart .. baselineEnd)
| where ResultType == 0
| summarize
    BaselineIPs = make_set(IPAddress, 30),
    BaselineCountries = make_set(
        tostring(LocationDetails.countryOrRegion), 10),
    BaselineApps = make_set(AppDisplayName, 20),
    AvgDailySignIns = count() / 30.0
    by UserPrincipalName;
// Each user's normal: which IPs, countries, apps, daily volume
// Deviation in detection window = candidate anomaly

Notice the 7-day gap between baseline end and the present. This prevents contamination. If the attacker compromised an account 5 days ago, and your baseline extends to the present, the attacker's IP appears in the baseline as "normal." The gap ensures that recent attacker activity does not contaminate the historical reference. For campaigns where longer dwell time is expected (APT, insider threat), extend the gap: a 90-day baseline ending 30 days ago provides protection against attackers with up to 30 days of dwell time.

Edge cases that break baselines. New employees have no 30-day history. Their first sign-in is always "anomalous" by definition. Exclude accounts created within the baseline window, or use a shorter baseline with lower confidence. Role changes create similar problems: a user promoted to Global Admin last week suddenly accesses resources they never touched before. The behavior is anomalous against the baseline but legitimate given the role change. Cross-reference with Entra ID audit logs for recent role modifications by querying AuditLogs for "Add member to role" operations in the last 14 days. If the user was promoted to SharePoint Admin 3 days ago and is now accessing SharePoint admin endpoints for the first time, the behavior is anomalous against the 30-day baseline but entirely consistent with the role change. Exclude or annotate accordingly.

Seasonal patterns matter too: end-of-quarter financial reporting, annual compliance reviews, and holiday periods all shift baseline patterns. NE's finance team accesses 3x more SharePoint documents in the final two weeks of each quarter than in the preceding ten weeks. A baseline built exclusively from mid-quarter data would flag every end-of-quarter access surge as anomalous. For roles with known seasonal variation, extend the baseline window to capture at least one full cycle (90 days for quarterly patterns, 365 days for annual patterns).

Service accounts require separate treatment. Their baselines are typically tight (the same IP, the same application, the same time schedule), and any deviation is significant. A service account that has signed in from a single IP for 6 months and suddenly authenticates from a different IP warrants immediate investigation, regardless of what the other four enrichment dimensions show.

The confidence scoring model

Five dimensions, each contributing to confidence. Three or more pointing toward compromise justifies escalation. Two or fewer means the anomaly likely has a legitimate explanation.

Analyst Decision

User: j.martinez@northgate-eng.com

User context: Marketing coordinator. No admin roles. No documented travel.

Temporal: Sign-in at 02:47 local time. User has never signed in outside 08:00-18:30 in 90 days.

Geographic: IP 185.220.101.42, Romania. Never signed in from Romania. No VPN egress in that range.

Behavioral: Inbox rule created (redirect "password" and "security" to RSS Feeds). OAuth app consented (Mail.ReadWrite). 340 emails forwarded to external address within 4 hours.

Infrastructure: IP belongs to residential proxy network (Bright Data ASN). User agent is Chrome 126 on Windows 10; user's managed device is Chrome 125 on Windows 11.

Verdict: 5/5 dimensions indicate compromise. Escalate immediately. Revoke all sessions. Disable account. Preserve evidence.

Every dimension points the same direction for j.martinez. A non-traveling marketing coordinator, signing in at 02:47 from a Romanian residential proxy, creating inbox rules to hide security notifications, consenting to a mail-access OAuth app, and forwarding 340 emails within 4 hours. No single data point is conclusive. Together, they form an unambiguous finding.

Now compare this to one of the conference attendees: the user context shows documented travel to Munich, the temporal context shows sign-in during German business hours, geographic shows a German hotel IP, behavioral shows normal email and document access, and infrastructure shows a clean German ISP. All five dimensions point toward legitimate activity. The same anomaly type (new IP + new MFA), completely different analytical conclusion.

This is why analysis cannot be automated by a simple rule. The rule "new IP + new MFA = compromise" fires on both j.martinez (a genuine compromise) and the conference attendee (a false positive). The five-dimension enrichment is what separates the two. Automation can surface candidates for analysis, and Sentinel's UEBA risk scoring does exactly that. But the contextual judgment that distinguishes attack from travel requires human analysis, informed by environmental knowledge that no automated system possesses.

Document your reasoning for every user in the analysis population, not just the ones you escalate. When the hunt record shows "User A: 5/5 dimensions indicate compromise, escalated" alongside "User B: 1/5 dimensions suspicious, user context confirms conference travel, closed," the record demonstrates analytical rigor. Six months later, when a different analyst reviews the hunt, they can see exactly what was examined and why each decision was made.

Escalating on a single dimension

The analyst sees a sign-in from a new country and immediately opens an IR ticket. No user context check. No temporal analysis. No behavioral enrichment. The "compromise" turns out to be the CFO at an investor meeting in Singapore. Three false escalations in a month and leadership stops trusting hunt findings. Single-dimension escalation produces noise that undermines the hunting program. The five-dimension enrichment framework exists to prevent this: enrich fully before deciding, and document which dimensions supported your conclusion.

Threat Hunting Principle

Analysis is contextual judgment, not pattern matching. The data that identifies a compromise is often identical to the data produced by a business trip. Five enrichment dimensions (user, temporal, geographic, behavioral, infrastructure) provide the context that separates the two. No single dimension is conclusive. The combination, documented with reasoning at each step, produces findings that survive scrutiny.

Next
Section 1.5 teaches how to conclude a hunt: confirming or refuting the hypothesis, documenting the finding (positive or negative), and deciding whether to escalate to IR or convert the query to a detection rule. The conclusion transforms your analysis into an organizational decision.
Unlock the Full Course See Full Course Agenda