In this section

Data Types in Depth

3-4 hours · Module 1 · Free

What you already know

You know that every column in a KQL table has a fixed data type, and you have seen getschema output showing string, datetime, and dynamic columns. Now you need to understand every data type you encounter in security log tables, why certain columns are stored as types you would not expect, and how type mismatches cause queries to fail silently, returning zero rows instead of an error.

Scenario

Tom writes a detection rule to alert on sign-in failures with error code 50126 (invalid username or password). His query: SigninLogs | where TimeGenerated > ago(1h) | where ResultType == 50126 | summarize FailCount = count() by UserPrincipalName. The rule deploys without errors. It runs every five minutes. It never fires. The SOC assumes the attack category does not occur in the environment. Three weeks later, during an incident review, Priya discovers that ResultType is a string column. Tom's query compared a string to an integer. The comparison silently evaluated to false for every row. The detection was blind from the day it deployed.

The types that security data uses

Nine data types appear in Sentinel and Defender XDR tables. Three of them (string, datetime, and dynamic) account for the vast majority of columns you will interact with during investigations. The remaining six appear in specific contexts.

Figure 1.3: Nine KQL data types grouped by frequency of use in security queries. The silent failure zone at the bottom shows the cross-type comparisons that produce wrong results without errors.

String is the most common type and the most deceptive. UserPrincipalName, IPAddress, AppDisplayName, and ResultType are all strings. The deception comes from fields like ResultType that store numeric codes as text. The value in the column is the string "50126", not the integer 50126. KQL does not implicitly convert between string and integer during comparison. where ResultType == 50126 evaluates the comparison as string-to-integer, which always returns false. No error. No warning. Zero rows.

String comparison in KQL is case-sensitive by default. where AppDisplayName == "Azure Portal" matches exactly that casing. where AppDisplayName == "azure portal" matches nothing. For case-insensitive comparison, use =~ instead of ==. For case-insensitive inequality, use !~. In investigation queries where you are searching for a known entity, =~ is the safer default because data sources normalize casing inconsistently.

Datetime stores a point in time in UTC. Every security event has a TimeGenerated column of type datetime, representing when the event was ingested into the workspace. KQL provides rich datetime operations: ago() for relative time calculation, between() for range filtering, datetime_diff() for computing intervals between events, bin() for grouping events into time buckets, and format_datetime() for display formatting. All datetime values in KQL are stored in UTC. If your analysts work in a different time zone, the conversion happens at the display layer, not in the query.

KQL

// All equivalent ways to filter the last 24 hours
| where TimeGenerated > ago(24h)
| where TimeGenerated > ago(1d)
| where TimeGenerated > datetime(2026-05-20T00:00:00Z)
| where TimeGenerated between (ago(24h) .. now())

Datetime arithmetic is native. Subtracting two datetimes produces a timespan. Adding a timespan to a datetime produces a new datetime. This is how you calculate session durations, time between events, and investigation windows. For investigations, datetime_diff() is particularly valuable: datetime_diff("minute", Event2Time, Event1Time) returns the number of minutes between two events. This powers time-delta analysis that detects automated credential attacks (where login attempts arrive at unnaturally consistent intervals) versus human typing (where intervals are irregular).

The bin() function groups datetime values into fixed-width time buckets. summarize count() by bin(TimeGenerated, 1h) produces hourly event counts. This is the foundation of time-series analysis in Module 7, but even at this stage, binning by hour or day helps you visualize attack timelines and activity patterns during investigations.

Dynamic stores JSON objects and arrays. This is where the richest investigation data lives. LocationDetails in SigninLogs contains a nested object with geographic information:

JSON

{
  "city": "London",
  "state": "England",
  "countryOrRegion": "GB",
  "geoCoordinates": {
    "latitude": 51.5074,
    "longitude": -0.1278
  }
}

DeviceDetail contains operatingSystem, browser, trustType, and isCompliant. AuthenticationDetails is an array where each element describes one step of the authentication flow (primary credential, MFA challenge, conditional access evaluation). ConditionalAccessPolicies is an array of every CA policy that was evaluated during the sign-in, with the policy name and result.

Accessing nested fields in dynamic columns uses dot notation for objects (LocationDetails.countryOrRegion) and bracket notation for arrays (AuthenticationDetails[0]). The result of a dynamic field access is also dynamic, which means you must cast it to the target type before using it in comparisons or aggregations:

KQL

SigninLogs
| where TimeGenerated > ago(1h)
| extend City = tostring(LocationDetails.city)
| extend Lat = toreal(LocationDetails.geoCoordinates.latitude)
| extend AuthMethod = tostring(AuthenticationDetails[0].authenticationMethod)
| extend OS = tostring(DeviceDetail.operatingSystem)

Without the tostring() or toreal() cast, the extracted value remains dynamic, and string comparisons like where City == "London" may not behave as expected. The cast is not optional.

The remaining types

Bool stores true or false. KQL allows a shorthand: where IsInteractive is equivalent to where IsInteractive == true. For false, use where not(IsInteractive) or where IsInteractive == false. Bool columns in security tables include IsInteractive in SigninLogs (which distinguishes human logins from token refreshes and background SSO), IsCompliant and IsManaged in DeviceDetail sub-fields, and IsBillable in Usage tables. The distinction between interactive and non-interactive sign-ins is critical for investigation: AiTM token replay attacks appear in AADNonInteractiveUserSignInLogs where IsInteractive is false, and investigating only interactive sign-ins misses them entirely.

Int and long store integers. int is 32-bit (up to approximately 2.1 billion). long is 64-bit (up to approximately 9.2 quintillion). In security log tables, most numeric columns are long. Event counts, byte totals, and process IDs use long. For counting operations in your own queries, the aggregation functions count() and dcount() return long values. Always use long for identifier comparison because real (floating-point) loses precision on very large integers.

Real (also called double) stores 64-bit floating-point numbers with approximately 15 digits of precision. Use it for percentages, ratios, averages, and any calculation where fractional values matter. The round() function controls decimal places: round(100.0 * FailCount / TotalCount, 1) produces one decimal place. Latitude and longitude values from geo-IP lookup are stored as real inside the LocationDetails.geoCoordinates dynamic sub-object. The decimal type provides 128-bit precision but is rarely needed in security analysis.

Timespan represents a duration. It is the result of subtracting two datetimes or specifying a literal like 1d, 12h, 30m, or 1d12h30m. Dividing a timespan by a unit timespan converts it to a number: Duration / 1h produces the number of hours as a real. This conversion is essential for threshold calculations: "alert if the time between first failed sign-in and successful sign-in is less than 5 minutes" requires computing the timespan between two events and comparing it to a threshold.

Guid stores 128-bit unique identifiers displayed as hyphenated hex strings. CorrelationId links related sign-in events that share the same authentication flow. TenantId identifies the Entra ID tenant. Compare guids with == (exact match) or in (list membership). Do not use has or contains on guid columns because those operators are designed for tokenized string search and will not match partial GUIDs reliably.

When dynamic fields are stored as strings

Some columns that appear to contain JSON are actually stored as strings. Before you can use dot notation for field access, you must parse the string into a dynamic object. The signal is when dot notation returns empty results on a column that clearly contains JSON when you view it in the results pane. The fix is parse_json():

KQL

// If AdditionalDetails is a string containing JSON:
| extend Parsed = parse_json(AdditionalDetails)
| extend Value = tostring(Parsed.someField)

Section 1.4 covers parse_json() and all other conversion functions in depth. For now, remember: if dot notation returns empty on a column that visually contains JSON, the column is a string, not dynamic. Parse it first.

Inspecting types at runtime

When a query produces unexpected results, verify your assumptions about column types before anything else. Two tools handle this:

KQL

// Schema-level check: what type is the column defined as?
SigninLogs | getschema | where ColumnName == "ResultType"
// Runtime check: what type does the actual value have?
SigninLogs | take 1 | project ResultType, TypeCheck = gettype(ResultType)

getschema returns the declared type from the table schema. gettype() returns the runtime type of an actual value. Both should agree, but in cases where a column is declared dynamic and contains a simple string, gettype() might return "string" while getschema shows "dynamic". The runtime type is what governs comparison behavior.

Dynamic arrays and the mv-expand operator

Dynamic arrays require special handling because a single cell can contain multiple values. ConditionalAccessPolicies in SigninLogs is an array of objects, where each object represents one CA policy evaluation. A single sign-in event might evaluate 15 different policies. To analyze individual policies, you need to unpack the array into separate rows.

KQL

SigninLogs
| where TimeGenerated > ago(1h)
| where ResultType == "0"
| mv-expand CAPolicy = ConditionalAccessPolicies
| extend PolicyName = tostring(CAPolicy.displayName)
| extend PolicyResult = tostring(CAPolicy.result)
| summarize count() by PolicyName, PolicyResult
| sort by count_ desc

mv-expand takes each element of the array and creates a separate row for it. If a sign-in event evaluated 15 CA policies, mv-expand produces 15 rows from that single event. The remaining columns in each row are duplicated from the original. After expanding, you can filter, aggregate, and analyze individual policy results. This is how you find policies that evaluate thousands of times with result: "notApplied" (possible misconfiguration) or policies that block access with result: "failure" (verify the blocks are intentional).

For object exploration, bag_keys() returns the list of keys in a dynamic object, and array_length() returns the number of elements in a dynamic array:

CLI Output

SigninLogs | take 1 | project bag_keys(LocationDetails)
→ ["city","state","countryOrRegion","geoCoordinates"]
SigninLogs | take 1 | project array_length(ConditionalAccessPolicies)
→ 12

These exploration functions are the equivalent of getschema for dynamic columns. They tell you what keys and array lengths to expect before you write extraction queries.

Understanding dynamic is not optional for security investigation. The most important evidence in sign-in investigations lives inside dynamic columns: where the user authenticated from (LocationDetails), what device they used (DeviceDetail), how they authenticated (AuthenticationDetails), and which security policies evaluated their session (ConditionalAccessPolicies). If you cannot extract from dynamic columns, you cannot investigate beyond the surface-level fields that the portal UI displays. Every advanced investigation query in this course uses dynamic field extraction, starting with the tostring() and toreal() casts you saw above and progressing to parse_json(), mv-expand, and bag_unpack in later modules.

Analyst Decision

Type verification workflow:

String column with numeric values (ResultType, ErrorCode): Always quote the comparison value. where ResultType == "50126", never where ResultType == 50126.

Dynamic column with nested fields: Cast to the target type before comparison. tostring(LocationDetails.city) == "London", not LocationDetails.city == "London".

Unknown column type: Run TableName | getschema | where ColumnName == "FieldName" before writing any filter. Ten seconds of verification prevents hours of debugging silent mismatches.

The implicit type hierarchy

When two different numeric types appear in an arithmetic expression, KQL automatically promotes the narrower type to the wider type: int → long → real → decimal. This means int + long produces long, and long + real produces real. The promotion is lossless going upward, but be aware that converting a very large long value to real can lose the last few digits of precision because real has approximately 15 significant digits. For security metric calculations (percentages, averages, rates) this precision loss is never meaningful. For exact value comparisons on large identifiers, keep the value as long or string.

This hierarchy matters when you build detection rules with thresholds. A rule that computes FailCount / TotalCount where both values are long performs integer division (truncating the decimal). The result is 0 for any ratio less than 1.0, which means your threshold check where FailRate > 0.5 never fires. The fix is to introduce a real literal: 100.0 * FailCount / TotalCount promotes the entire expression to real, preserving the decimal portion. This is one of the most common arithmetic bugs in detection engineering.

The unquoted numeric string

The analyst writes where ResultType == 0 in a detection rule. The rule deploys, runs on schedule, and returns zero rows every execution cycle. The SOC dashboard shows the rule as "healthy" because it does not error. Three weeks pass before someone notices the rule has never generated an alert. The fix takes 30 seconds: change 0 to "0". The wasted three weeks cannot be recovered. Every detection rule that filters on a string column with a numeric-looking value must use quoted comparison. No exceptions.

KQL Principle

KQL does not warn you about type mismatches. It runs the query and silently returns wrong results. The only defense is checking the column type before writing the comparison. Run getschema on every unfamiliar table. Run gettype() on any value that behaves unexpectedly. Make type verification a habit, not a debugging step.

Section 1.4 teaches the conversion functions that transform data between types. You will learn when KQL performs implicit conversion, when it fails silently, and how to write explicit conversions that make queries type-safe across every table.

Unlock the Full Course See Full Course Agenda

Get weekly detection and investigation techniques

KQL queries, detection rules, and investigation methods — the same depth as this course, delivered every Tuesday.

No spam. Unsubscribe anytime. ~2,000 security practitioners.

← Previous Next →