In this section

TH4.1 Building the Authentication Baseline

6-8 hours · Module 4

What you already know

Module 3 covered ATT&CK coverage analysis and hunt backlog prioritisation. This module covers hunting identity compromise, credential theft, token replay, impossible travel, and the sign-in patterns that indicate an attacker has obtained valid credentials.

The baseline is the hunt

Hunting identity compromise is not pattern matching against known-bad indicators, it is anomaly detection against known-good behavior. A sign-in from 198.51.100.42 is not inherently suspicious. A sign-in from 198.51.100.42 for a user who has only ever signed in from 203.0.113.0/24 (the corporate VPN range) is suspicious. The IP does not carry the signal. The deviation from baseline carries the signal.

Anti-Pattern

Hunting building the authentication baseline without a hypothesis

The hunter opens Advanced Hunting and starts writing queries without a clear hypothesis. They find interesting data but cannot determine whether it represents a threat, a misconfiguration, or normal activity. Every hunt starts with a hypothesis: a specific, testable statement about attacker behavior. Without a hypothesis, you are exploring, not hunting. Exploration has value, but it produces findings you cannot action without additional scoping.

This is why the baseline comes first. Every subsequent subsection in this module references the baseline to determine whether a sign-in pattern is anomalous. Building the baseline once, accurately, with known-good exclusions, prevents every downstream query from producing false positives caused by legitimate but uncommon behavior.

Five dimensions of authentication behavior

Identity compromise manifests along five observable dimensions. A compromised account deviates in one or more of them. A baseline that captures all five provides the detection surface for every technique in this module.

IP address. Each user has a set of IPs they normally sign in from: corporate office ranges, VPN egress IPs, home ISP ranges, and mobile carrier ranges. The set is typically 3–8 distinct IPs (or /24 subnets) over 30 days. An attacker using a residential proxy, VPS, or Tor exit node introduces an IP outside this set.

Device. Each user has a set of devices: corporate laptop, mobile phone, occasionally a personal device. The DeviceDetail field in SigninLogs captures the device ID, operating system, and browser. An attacker using a different device, or a headless browser that reports no device details, deviates from the baseline.

Location. Each user signs in from a consistent set of geographic locations: the city where the office is, possibly one or two other cities for travel. The LocationDetails field provides city, state, and country. An attacker signing in from a different country (or from a country the user has never visited) deviates. Note: VPN users may show the VPN server's location rather than their physical location: the baseline must account for this.

Application. Each user accesses a consistent set of applications: Outlook, Teams, SharePoint, possibly a few line-of-business apps. An attacker who accesses applications the user has never used (PowerShell, Graph Explorer, Azure Portal) deviates. The AppDisplayName field reveals this.

Authentication pattern. Each user has a consistent ratio of interactive to non-interactive sign-ins, a consistent MFA method, and a consistent authentication frequency. An attacker who triggers many non-interactive token refreshes (from session token replay) changes this ratio. An attacker who adds a new MFA method (registering a phone number for SMS OTP) deviates from the established method set.

Building the IP baseline

// Per-user IP baseline — 30 days of normal sign-in IPs
// This is the foundation query for AiTM and impossible travel hunts
let baselineWindow = 30d;
let BaselineIPs = SigninLogs
| where TimeGenerated > ago(baselineWindow)
| where ResultType == 0  // Successful sign-ins only
| where IsInteractive == true  // Interactive only for baseline
// Non-interactive excluded from baseline because token refreshes
//   inherit the IP of the original interactive sign-in, which may
//   be stale if the user has moved. Baseline on where the user
//   actually authenticates, not where their tokens refresh.
| summarize
    IPSet = make_set(IPAddress, 50),
    IPCount = dcount(IPAddress),
    SubnetSet = make_set(
        strcat(split(IPAddress, ".")[0], ".",
               split(IPAddress, ".")[1], ".",
               split(IPAddress, ".")[2], ".0/24"), 30),
    SignInCount = count(),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by UserPrincipalName
// Each row = one user's 30-day IP profile
// IPSet = the set of IPs this user signed in from
// SubnetSet = the /24 subnets (more stable than exact IPs for
//   ISPs that rotate addresses within the same /24)
// IPCount = how many distinct IPs — users with 15+ IPs may use
//   mobile hotspots or rotating ISPs (normal but noisy baseline)
// SignInCount = authentication volume — users with < 5 sign-ins
//   in 30 days have an unreliable baseline (too few observations)
;
BaselineIPs
| where SignInCount >= 5  // Minimum 5 sign-ins for reliable baseline
| project UserPrincipalName, IPSet, SubnetSet, IPCount, SignInCount

Building the device baseline

// Per-user device baseline — devices used in last 30 days
let DeviceBaseline = SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| where IsInteractive == true
| extend DeviceId = tostring(DeviceDetail.deviceId)
| extend DeviceOS = tostring(DeviceDetail.operatingSystem)
| extend Browser = tostring(DeviceDetail.browser)
| extend DeviceFingerprint = strcat(
    DeviceOS, "|", Browser, "|",
    iff(isnotempty(DeviceId), DeviceId, "no-device-id"))
// DeviceFingerprint combines OS + browser + device ID into a
//   single comparable string. Users typically have 2-4 fingerprints:
//   corporate laptop (Windows|Edge|device-id-xxx),
//   mobile (iOS|Safari|device-id-yyy),
//   personal (Windows|Chrome|no-device-id)
| summarize
    Devices = make_set(DeviceFingerprint, 20),
    DeviceCount = dcount(DeviceFingerprint),
    HasManagedDevice = countif(isnotempty(DeviceId)) > 0
    by UserPrincipalName;
DeviceBaseline

Building the location baseline

// Per-user location baseline — geographic sign-in profile
let LocationBaseline = SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| where IsInteractive == true
| extend City = tostring(LocationDetails.city)
| extend State = tostring(LocationDetails.state)
| extend Country = tostring(LocationDetails.countryOrRegion)
| extend GeoLocation = strcat(Country, "|", State, "|", City)
| summarize
    Locations = make_set(GeoLocation, 20),
    Countries = make_set(Country, 10),
    CountryCount = dcount(Country),
    CityCount = dcount(City)
    by UserPrincipalName;
LocationBaseline
// CountryCount > 3 for a non-traveling user is unusual
// But VPN users may show the VPN server country, not their own
// Exclude known VPN egress IPs from the location baseline by
//   joining with a VPN IP list (if available from your network team)

Building the application baseline

// Per-user application baseline — which apps each user accesses
let AppBaseline = SigninLogs
| where TimeGenerated > ago(30d)
| where ResultType == 0
| summarize
    Apps = make_set(AppDisplayName, 30),
    AppCount = dcount(AppDisplayName),
    AdminApps = make_set_if(AppDisplayName,
        AppDisplayName in (
            "Azure Portal", "Microsoft Graph Explorer",
            "Microsoft Graph Command Line Tools",
            "Windows Azure Active Directory",
            "Microsoft Azure PowerShell",
            "Azure Active Directory PowerShell"))
    by UserPrincipalName;
AppBaseline
// AdminApps = applications used for administrative access
// Non-admin users with entries in AdminApps are anomalous —
//   why is a standard user signing into Azure PowerShell?
// This is not necessarily malicious — IT training, curiosity —
//   but it warrants investigation in the context of an identity hunt

Combining into a unified profile

// Unified per-user authentication profile
// Reference this in every subsequent hunt subsection
let baselineWindow = 30d;
let UserProfile = SigninLogs
| where TimeGenerated > ago(baselineWindow)
| where ResultType == 0
| where IsInteractive == true
| extend SubnetKey = strcat(
    split(IPAddress, ".")[0], ".",
    split(IPAddress, ".")[1], ".",
    split(IPAddress, ".")[2], ".0/24")
| extend Country = tostring(LocationDetails.countryOrRegion)
| extend DeviceOS = tostring(DeviceDetail.operatingSystem)
| extend Browser = tostring(DeviceDetail.browser)
| summarize
    BaselineIPs = make_set(IPAddress, 50),
    BaselineSubnets = make_set(SubnetKey, 30),
    BaselineCountries = make_set(Country, 10),
    BaselineApps = make_set(AppDisplayName, 30),
    BaselineDevices = make_set(
        strcat(DeviceOS, "|", Browser), 20),
    TotalSignIns = count(),
    DistinctIPs = dcount(IPAddress),
    DistinctCountries = dcount(Country)
    by UserPrincipalName;
UserProfile
| where TotalSignIns >= 5
// This is your per-user baseline. Save it as a function:
//   .create function UserAuthBaseline() {  }
// Every hunt in TH4.2–TH4.11 joins against this baseline
//   to determine whether a sign-in deviates from normal

Known-good exclusions

Before using the baseline, add exclusions for infrastructure that produces legitimate but unusual sign-in patterns:

VPN IP ranges. If your corporate VPN uses 2–3 egress IPs that change monthly, all users will show those IPs regardless of their location. Add VPN ranges to a known-good list so they do not trigger false positives in the impossible travel hunt (TH4.7).

Service accounts. Accounts used by automated processes (Azure AD Connect sync, service desk integrations, SCIM provisioning) sign in from fixed infrastructure IPs with no device context. Their authentication pattern is constant, not behavioral. Exclude them from behavioral baselines or build a separate service account baseline with tighter thresholds (any IP deviation is suspicious for a service account).

Break-glass accounts. Emergency access accounts should have zero sign-ins under normal conditions. Any sign-in from a break-glass account is automatically suspicious, do not baseline them. Monitor them with a zero-tolerance rule.

Shared/generic accounts. Accounts used by multiple people (reception@, info@, shared-mailbox@) cannot have meaningful per-user baselines because the "user" is multiple humans. Flag these for separate monitoring with volume-based thresholds rather than behavioral baselines.

// Identify service accounts and shared accounts for exclusion
// Service accounts: high non-interactive volume, no interactive sign-ins
SigninLogs
| where TimeGenerated > ago(30d)
| summarize
    Interactive = countif(IsInteractive == true),
    NonInteractive = countif(IsInteractive == false),
    Total = count()
    by UserPrincipalName
| extend InteractiveRatio = round(
    100.0 * Interactive / Total, 1)
| where InteractiveRatio < 5 and Total > 100
// Users with < 5% interactive sign-ins and high volume
//   are likely service accounts or automated processes
// Verify manually and add to exclusion list
| sort by Total desc

Figure TH4.1. Five-dimension authentication baseline. Each hunt compares current sign-in events against this per-user profile. Deviation in multiple dimensions simultaneously is the strongest compromise signal.

Run the unified profile query against your workspace. How many users have baselines with 5+ sign-ins? That is your huntable population.

Run the service account identification query. How many accounts have < 5% interactive sign-ins? Verify these are genuine service accounts and add them to your exclusion list.

Examine 3 random users' profiles. Do the IP counts, device fingerprints, and application lists look reasonable? This manual validation catches data quality issues before they produce false positives in the hunt queries.

Save the unified profile query as a Sentinel saved function: .create function UserAuthBaseline() { <query> }. You will reference it in every subsequent subsection.

Compliance Context

Identity Protection uses Microsoft's global model trained on billions of sign-ins. It is excellent at detecting common anomalies (unfamiliar IP from a foreign country, anonymous proxy) but it is not tuned to your organization's specific patterns. Your corporate VPN range, your approved remote access IPs, your organization's travel patterns, and your service account infrastructure are opaque to the global model. A custom baseline trained on your 30-day history produces organization-specific anomaly detection that catches deviations the global model classifies as normal, and avoids false positives that the global model triggers on your legitimate infrastructure. The two baselines complement each other: Identity Protection catches globally suspicious patterns, your custom baseline catches locally suspicious patterns.

Extend this baseline

For organizations with Defender for Identity deployed, extend the baseline to include on-premises authentication patterns from IdentityLogonEvents. The hybrid baseline captures both cloud sign-ins (SigninLogs) and on-premises domain controller authentications (IdentityLogonEvents), providing visibility into attackers who compromise cloud credentials and pivot to on-premises resources (or vice versa). The TH10 (Lateral Movement) campaign uses this hybrid baseline extensively.

Checkpoint

Baseline window: 30 days

Baseline source: SigninLogs (interactive only, successful only)

Minimum observations per user: 5 sign-ins

Users with valid baselines: ___

Users excluded (service accounts): ___

Users excluded (shared accounts): ___

VPN IP ranges added to known-good: ___

Break-glass accounts flagged for zero-tolerance: ___

Saved function name: UserAuthBaseline

References Used in This Subsection

Microsoft. "Sign-in logs in Azure Active Directory." Microsoft Learn. https://learn.microsoft.com/en-us/entra/identity/monitoring-health/concept-sign-ins
Course cross-references: TH1.10 (behavioral baselining methodology), TH2.2 (percentile-based deviation), TH2.3 (make-series for time-series baselines)

← Previous Next →