Documentation & Tools →
Sign In
In this section

Detection-as-Code: Sigma as the Vendor-Neutral Source Format

Module 0

Scenario

Your team runs Sentinel. A sister company runs Splunk. Both need to detect Kerberoasting. You write the detection once in KQL. When the sister company asks for the rule, you rewrite it from scratch in SPL because there is no way to translate one to the other without rebuilding the logic, the field mapping, and the scoping by hand. Two rules, two maintenance burdens, no guarantee they detect the same thing.

Sigma solves that problem. You write the detection once in a vendor-neutral YAML format, and sigma-cli converts it to KQL, SPL, Elastic query, or any other supported backend through a conversion pipeline. One source, multiple targets. The maintenance burden is centralised: when the detection logic changes, you update one Sigma rule, and the conversion produces the updated query for every backend.

This is not just a multi-SIEM convenience. Even a team that runs a single SIEM benefits from Sigma because the format is portable. When the team migrates from Splunk to Sentinel, or adds an Elastic deployment alongside their existing SIEM, the detection library migrates with a configuration change, not a rewrite. The rules are an asset that survives a platform change, which is rare in detection engineering.

What Sigma is

Sigma is a structured, YAML-based format for describing detection rules independently of any SIEM. It defines what to detect (the condition), where to detect it (the logsource), and what it means (the metadata). The conversion to a specific backend is handled by a separate tool, not by the rule author. The relationship is analogous to a programming language and its compiler: the author writes the source, the compiler produces the target, and the author does not need to know the target's instruction set.

A Sigma rule has three structural parts. The logsource tells the converter which log category and product the rule applies to. The detection defines the conditions that must match. The metadata carries the title, description, ATT&CK mapping, severity, false-positive notes, and author. Here is what a minimal rule looks like:

detections/execution/encoded-powershell.yml Sigma
title: Suspicious Encoded PowerShell Command
id: f4b89820-2a14-4f93-b9e7-12a4c8d9e1f0
status: test
description: |
    Detects PowerShell execution with a base64-encoded command,
    a common technique for obfuscating malicious payloads.
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        Image|endswith: '\powershell.exe'
        CommandLine|contains:
            - '-enc '
            - '-EncodedCommand '
            - '-ec '
    condition: selection
level: medium
tags:
    - attack.execution
    - attack.t1059.001
falsepositives:
    - Legitimate administrative scripts using encoded commands
    - Configuration management tools

The rule reads top to bottom: it targets Windows process-creation events where the image is PowerShell and the command line contains an encoded-command flag. The |endswith and |contains are Sigma modifiers that express matching logic without committing to a query language. The tags map the rule to ATT&CK Execution (T1059.001). The false-positive notes document the known legitimate cases.

Every Sigma rule decomposes into three functional blocks. Recognising them turns the YAML from a wall of keys into a structure you can read at a glance:

The three functional blocks of a Sigma rule METADATA title, id (UUID), status, description, level, tags, falsepositives Manages the rule's lifecycle and triage. Does not affect detection logic. The tags drive coverage. LOGSOURCE category: process_creation    product: windows Names the kind of event, not a table. The pipeline maps it to DeviceProcessEvents, sysmon, etc. DETECTION selection: { Image|endswith, CommandLine|contains }    condition: selection The matching logic. Named selections combined by a condition expression. This is what fires.

Metadata manages the rule. Logsource says what kind of event to look at. Detection says what to match. Every rule you write in this course has these three blocks, and the CI lint step checks that each is present and well-formed.

Walk through the key fields. The id is a UUID that uniquely identifies the rule across all Sigma repositories; it is how the community references and deduplicates rules. The status indicates the rule's maturity: test means it is under development or evaluation, stable means it is production-ready, and experimental means it is exploratory. The level (medium, high, critical) tells the SOC how urgently to triage the alert. None of these fields affect the detection logic, but they are how the pipeline manages the rule's lifecycle: a CI step can enforce that only stable rules are deployed to production, for example.

The logsource block deserves particular attention. It uses category: process_creation and product: windows but does not name a table. It does not say DeviceProcessEvents or sysmon or Endpoint.Processes. The table is determined by the pySigma backend pipeline during conversion. This is the abstraction that makes Sigma vendor-neutral: the rule describes what kind of event it targets, and the pipeline decides where that event lives in the target SIEM. If your organization switches from Sysmon to Microsoft Defender for Endpoint as the telemetry source, you update the pipeline configuration, not every rule in the repository.

The detection block uses a named selection (selection) and a condition that references it (condition: selection). Rules with multiple selections can combine them with Boolean logic: condition: selection1 and not exclusion1 or condition: 1 of selection*. The condition syntax is compact but expressive enough to handle the majority of detection patterns. Module 1 covers the full condition language, including all of, 1 of, and nested selections.

One rule, multiple backends Sigma rule (YAML) vendor-neutral source Sentinel KQL pySigma microsoft365defender Splunk SPL pySigma splunk Elastic EQL pySigma elasticsearch

The Sigma rule is the source. sigma-cli and pySigma handle the conversion, with backend-specific pipelines that know how each SIEM names its fields and structures its queries.

Why Sigma, specifically

Sigma is not the only vendor-neutral detection format, but it is the one with the broadest ecosystem. The SigmaHQ repository holds thousands of community-contributed rules covering ATT&CK techniques across Windows, Linux, macOS, AWS, Azure, GCP, and network telemetry. The pySigma conversion framework supports backends for every major SIEM. The specification is open, maintained by the Sigma project, and versioned so you know which features a given converter supports. When you write a Sigma rule, you write into an ecosystem, not into a proprietary format.

The SigmaHQ repository is worth understanding as a resource. It is organized by logsource category and ATT&CK tactic, and each rule follows the same structural conventions the course teaches. When a new vulnerability is disclosed or a new technique is documented, community rules often appear within days. That speed is a practical advantage: instead of starting from scratch, you can pull a community rule, evaluate it against your environment's field mappings, tune the false-positive conditions for your data, and commit it to your pipeline. The rule is a starting point, not a finished product. You still review the condition logic, verify the logsource maps to your telemetry, and add test fixtures. But the community provides the initial detection logic, and your pipeline provides the quality controls.

The tooling ecosystem extends past sigma-cli itself. Projects like sigmac (the legacy converter, now superseded by pySigma), SigConverter (a web-based conversion tool), and DeTT&CT (which maps Sigma rules to ATT&CK Navigator layers) build on the same format. The integration with Atomic Red Team is particularly relevant for this course: Atomic provides test procedures for ATT&CK techniques, and each procedure produces telemetry that can be captured as a test fixture for the corresponding Sigma rule. The pairing of Sigma rules with Atomic test output is the foundation of Module 5's testing approach.

The conversion is not string substitution. A pySigma backend pipeline knows how to translate Sigma's abstract logsource (category: process_creation, product: windows) into the specific table and field names each SIEM uses. In Sentinel, process creation lives in DeviceProcessEvents with a field called ProcessCommandLine. In Splunk, it may live in a sysmon sourcetype with a field called CommandLine. The pipeline handles that mapping so the rule author does not have to. When Sentinel renames a table or adds a column, the fix is a change to the pipeline configuration, not a change to every rule that uses the affected field.

The modifier system is where much of Sigma's expressiveness lives. Modifiers like |endswith, |contains, |startswith, |base64offset, |re, |all, and |windash let you express matching logic that would otherwise require backend-specific functions. The |endswith modifier in the encoded-PowerShell example is converted to has_any or a trailing-wildcard match depending on the backend. The rule author writes the intent; the converter produces the implementation.

The logsource abstraction is equally important. A Sigma rule says category: process_creation, product: windows and does not name a table. The pySigma pipeline decides which table that maps to based on the target backend and the data source configuration. This means the same rule can target Sysmon process-creation events in one environment and Microsoft Defender for Endpoint process events in another, without changing the rule. The abstraction is what makes the "write once, deploy many" promise real.

Anti-Pattern

"Sigma is just YAML with fancy field names"

Sigma's value is not in the YAML syntax. It is in the logsource abstraction (the rule says "process creation on Windows" and the pipeline decides which table and fields that means), the modifier system (|endswith, |contains, |base64offset, |re let you express matching logic without backend-specific functions), and the ecosystem (community rules, maintained backends, tooling). Treating Sigma as "just YAML" leads to rules that hardcode backend field names and break the conversion model.

Where Sigma stops

This course is honest about Sigma's limits because pretending the format covers every detection would make it a marketing course, not a practitioner course. Module 9 is dedicated to the expressiveness ceiling: the point where a detection legitimately cannot be expressed in Sigma and must stay as native KQL or SPL.

The limits are real. Sigma's condition language handles field matching well, but temporal correlation (two events within five minutes of each other), statistical baselines (a count deviating from a rolling average), and multi-event joins (combining data from two different logsources) push past what the specification supports or what the converters can translate reliably. A detection that requires a KQL join with a bin time window, or a Splunk stats with a streamstats baseline, may not survive the conversion with its logic intact. These are not theoretical edge cases; they represent a significant fraction of the detection patterns a mature team needs, and the course addresses them directly rather than pretending the limitation does not exist.

Consider a practical case. You want to detect a brute-force attack: more than 10 failed logon attempts from a single source within five minutes, followed by a successful logon. That detection requires counting events over a time window, grouping by source, and correlating the count with a separate event type. Sigma's correlation rules are evolving to handle some of these patterns, but the converter support varies by backend, and the result may not match what a hand-tuned native query produces. A detection engineer who forces this into Sigma and gets a broken conversion has wasted time and shipped a false confidence.

The right response is not to abandon Sigma for these cases. It is to recognise the boundary. Write the detection as native KQL or SPL, put it in the same repository, version it through the same pipeline, test it with the same fixture approach, and review it with the same PR workflow. The only thing that changes is the source format. The engineering practice is identical.

The discipline still applies. A native KQL rule that cannot be expressed in Sigma still belongs in the repository, still gets tested against fixtures, still goes through a PR review, and still deploys through the pipeline. The source format changes; the engineering practice does not. Sigma is the default unit, not the only one. Module 9 teaches you to recognise the boundary, make the decision, and manage native rules through the same pipeline without creating a parallel, undisciplined workflow.

What Module 1 covers in depth

This orientation gives you the shape of Sigma: what a rule looks like, why the format exists, and where it reaches its limits. Module 1 goes deep. You will learn the full logsource taxonomy, the modifier system, the field naming conventions, correlation rules, and how to write a Sigma rule that converts cleanly to multiple backends. You will see what a clean conversion looks like and what a problematic one looks like, and you will learn to tell the difference before the CI pipeline tells you. The rule you write in Module 1 is the first file you commit to your detection repository.

Section 0.4 shows how all the pieces connect: the end-to-end pipeline from a Sigma rule through testing, CI, deployment, and coverage measurement.