In this section

Building Your Sign-In Baseline

4-5 hours · Module 1 · Free

What you already know

Sections 1.8 through 1.10 gave you the KQL patterns for aggregation, time-series analysis, and cross-table correlation. This section applies those patterns to a specific operational purpose: building the baseline of normal sign-in behavior that makes anomaly detection possible. Without a baseline, every sign-in looks the same. With one, deviations become visible — and deviations are where attacks hide.

Scenario

A sign-in from Belgium triggers a review. Is it suspicious? If your company has no employees in Belgium, it's a strong anomaly signal. If your company has an office in Brussels, it's routine. A sign-in at 3 AM triggers a review. Is it suspicious? If the user is a 9-to-5 office worker who has never signed in after midnight, it warrants investigation. If the user is a SOC analyst who regularly works the overnight shift, it's expected. Anomaly detection is only as good as the baseline it measures against. You learn to build one.

Why baselines matter more than static rules

Static detection rules use fixed thresholds: "alert if more than 10 failed sign-ins in 5 minutes" or "alert if sign-in from a blocked country." These rules catch known patterns but miss subtle anomalies. A sign-in from Belgium isn't from a "blocked country," but it's anomalous for a company with no Belgian operations. A slow password spray at 5 attempts per hour across 200 accounts stays below the "10 in 5 minutes" threshold while still being an active attack.

Baseline-driven detection compares current behavior against established normal patterns. Instead of "alert on sign-in from Russia," the approach is "alert on sign-in from any country this user has never signed in from in the past 30 days." Instead of "alert on more than 10 failures," the approach is "alert when the hourly failure rate exceeds 200% of the historical average." The baseline provides the context that makes the detection specific to your environment. Building it is the prerequisite for the detection rules you'll create in EI13.

The five baseline dimensions

A complete sign-in baseline covers five dimensions. Each captures a different aspect of normal behavior, and each enables a different class of anomaly detection.

Geographic baseline — which countries do your users sign in from? The geographic baseline uses the 30-day make_set pattern from Section 1.9: collect the set of countries per user, then flag any sign-in from a country not in that set. The country field from LocationDetails.countryOrRegion is the most reliable geographic signal. City-level baselining is too noisy for operational use — IP-to-city accuracy varies too much. Country-level is the right granularity for automated detection.

Temporal baseline — when do your users sign in? The temporal baseline captures active hours per user using hourofday(TimeGenerated) and dayofweek(TimeGenerated). A user whose baseline shows activity between hours 8–18 on weekdays but suddenly produces a 3 AM weekend sign-in is exhibiting behavior that warrants investigation. The temporal dimension catches attackers who operate in a different time zone — their working hours don't align with the legitimate user's established pattern.

Device baseline — which operating systems and devices do your users normally use? The device baseline captures the DeviceDetail.operatingSystem and DeviceDetail.deviceId per user. A user who normally signs in from Windows 11 suddenly appearing from Linux is an anomaly. A user who normally has 1–2 registered devices suddenly appearing with 5 distinct device IDs is an anomaly. These deviations don't prove compromise — but combined with other signals, they're strong contributors to a triage decision.

Application baseline — which applications does each user access? The application baseline captures the set of AppDisplayName values per user. This dimension catches privilege escalation indicators: a finance user who normally uses Outlook, Teams, and SharePoint suddenly accessing Azure Portal or Microsoft Graph Explorer is exhibiting a pattern consistent with a compromised account being used for reconnaissance. Administrative tools appearing in a non-administrator's baseline is always worth investigating.

IP baseline — which IP addresses does each user sign in from? The IP baseline is the noisiest dimension because IP addresses change legitimately — home networks, mobile hotspots, coffee shops, VPN exit nodes. It's most useful for identifying users with stable patterns (1–2 IPs, likely office-based) versus mobile users (10+ IPs, frequent travelers). For stable-pattern users, a new IP is a meaningful signal. For mobile users, IP alone isn't diagnostic — combine it with country and device for a composite assessment.

The composite baseline query

Combining all five dimensions into a single per-user profile produces the operational baseline you'll maintain and query against throughout the course:

KQL

// Comprehensive user baseline — 30-day reference profile
// Save this query. Re-run monthly to update.
SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| extend
    Country = tostring(LocationDetails.countryOrRegion),
    DeviceOS = tostring(DeviceDetail.operatingSystem),
    HourOfDay = hourofday(TimeGenerated)
| summarize
    // Geographic
    Countries = make_set(Country, 10),
    CountryCount = dcount(Country),
    // Temporal
    ActiveHours = make_set(HourOfDay, 24),
    // Device
    DeviceTypes = make_set(DeviceOS, 5),
    // Application
    Apps = make_set(AppDisplayName, 20),
    AppCount = dcount(AppDisplayName),
    // IP
    IPCount = dcount(IPAddress),
    // Volume and risk
    TotalSignIns = count(),
    AvgDaily = count() / 30.0,
    RiskySignIns = countif(RiskLevelDuringSignIn in ("medium", "high")),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by UserPrincipalName
| order by TotalSignIns desc

This query produces one row per user with their complete 30-day behavioral profile. Save it as a Sentinel saved query. When you need to assess whether a specific sign-in is anomalous, compare the sign-in's properties against the baseline for that user. A sign-in from a country not in Countries, at an hour not in ActiveHours, from a device type not in DeviceTypes, to an application not in Apps — each deviation increases the anomaly score. Multiple simultaneous deviations are the strongest indicator that something has changed, and change is what triage investigates.

Using the baseline for anomaly detection

The baseline enables the most powerful detection pattern in this course: deviation-from-established-behavior. The core query compares current activity against the stored baseline and flags anything new:

KQL

// Detect sign-ins from new countries not in 30-day baseline
let baseline = SigninLogs
| where TimeGenerated between (ago(30d) .. ago(1d))
| where ResultType == 0
| extend Country = tostring(LocationDetails.countryOrRegion)
| summarize BaselineCountries = make_set(Country) by UserPrincipalName;
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == 0
| extend Country = tostring(LocationDetails.countryOrRegion)
| where isnotempty(Country)
| join kind=inner baseline on UserPrincipalName
| where not(Country in (BaselineCountries))
| project
    TimeGenerated, UserPrincipalName, AppDisplayName,
    NewCountry = Country, IPAddress,
    BaselineCountries, RiskLevelDuringSignIn
| order by TimeGenerated desc

Every result is a user signing in from a country they have never used in the past 30 days. Not all are malicious — users travel, VPNs exit in new countries, corporate acquisitions add new geographies. But each represents a change from established behavior that should be reviewed, especially when combined with risk signals from Section 1.6 or authentication method anomalies from Section 1.4. This pattern — establish baseline, compare current, flag deviations — is the template you'll use for every baseline-driven detection rule in EI13.

Azure Portal

Microsoft Sentinel → Logs → run query → Save → Save as query
Save the composite baseline query and the anomaly detection query as named saved queries in your Sentinel workspace. Use descriptive names: "EI-BASELINE-01 User Profile (30d)" and "EI-BASELINE-02 New Country Detection." You'll re-run the baseline monthly to track drift, and you'll run the anomaly detection daily as part of the triage workflow that Section 1.12 organizes.

Figure 1.11 — Baseline-driven detection versus static rules. The baseline approach detects anomalies specific to each user's established behavior, catching deviations that static country-block rules miss.

Organizational baseline versus per-user baseline

Per-user baselines are the most precise but can be impractical to manage at scale. An organizational baseline provides a complementary layer — a set of norms that apply across the entire tenant regardless of individual user history.

The geographic organizational baseline is the list of countries where the organization operates. Any sign-in from outside this list is anomalous at the organizational level, even if the individual user has signed in from that country before. This is faster to implement than per-user baselines and catches the most obvious geographic anomalies: an organization that operates only in the US and UK doesn't expect sign-ins from Nigeria, regardless of individual user history. The organizational baseline query is a simpler version of the unexpected-country query from Section 1.7 — define let orgCountries = dynamic(["US", "GB"]), then filter for sign-ins from countries not in the list.

The temporal organizational baseline is the tenant-wide business hours pattern. While individual users work different hours, the tenant as a whole shows a predictable daily rhythm. A spike in 3 AM sign-ins across many accounts indicates an attack affecting multiple accounts simultaneously — a signal that individual user baselines might miss if each user's baseline includes occasional off-hours activity.

The application organizational baseline is the set of applications the organization uses. A new application appearing in sign-in logs that nobody in IT deployed may indicate shadow IT adoption. If the application was granted permissions through the OAuth consent flow — visible in AuditLogs as "Consent to application" — it could be consent phishing. EI9 covers application governance in detail.

Maintaining and refreshing baselines

A baseline that is never refreshed becomes increasingly inaccurate. Users travel, change roles, adopt new applications, and switch devices. The recommended refresh cadence is monthly. Re-run the composite baseline query at the start of each month using a rolling 30-day window. Compare the new baseline against the previous month to identify drift — users whose geographic footprint expanded, who adopted new applications, or who changed their working hours. Legitimate drift should update the baseline. Unexplained drift should trigger investigation.

Every control you deploy in this course changes the baseline. Blocking legacy authentication in EI3 eliminates legacy protocol sign-ins from the application baseline. Deploying phishing-resistant authentication in EI2 shifts the authentication method distribution. Enforcing device compliance in EI4 changes the device state ratios. Each change should be measurable — if you deploy a control and the baseline doesn't shift, the control isn't working as intended.

What we see in 90% of environments

"Identity Protection builds its own baseline — we don't need to build ours." Identity Protection's baseline is a black box. You cannot query it, inspect it, tune it, or explain it to an auditor. It detects generic anomalies (unfamiliar sign-in properties, atypical travel) but not organization-specific anomalies: a finance user accessing Azure Portal, a UK employee signing in from Belgium, an after-hours sign-in from a user who never works evenings. Your custom baseline captures the patterns specific to your environment. Custom detection rules built on this baseline catch the anomalies that Identity Protection's generic model misses. Use both — Identity Protection as the broad detection layer and custom baselines as the precision layer.

Identity Security Principle

You cannot detect anomalies without defining normal. The baseline is not a nice-to-have supplement to detection rules — it is the foundation that makes detection rules intelligent. A rule that says "alert on sign-in from new country" is meaningless without a per-user country set to compare against. Build the baseline before you build the detections. Refresh it monthly. Every control deployment should move the numbers.

Section 1.12 consolidates the queries from this module into an operational identity query library — organized by function (posture measurement, policy verification, risk triage, anomaly detection, investigation) so you can find and run the right query when you need it.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →