Catch C2 Beaconing by Its Cadence in Sentinel and Splunk

A beacon's connection gaps cluster around its sleep interval, so the coefficient of variation stays low even with jitter. Human and application traffic scatters across the window. Scoring the cadence finds the beacon with no indicator of compromise in hand.

Most C2 hunts start from a list. A feed of known-bad IPs, a set of newly registered domains, a pack of TLS fingerprints. The list works right up to the moment the operator rotates infrastructure, which for any competent intrusion is measured in days. Block the domain and the implant fails over to the next one. Match the IP and tomorrow it is a different IP. The indicators expire because they are built to.

What does not expire is the behavior underneath them. An implant that calls home does it on a schedule, because that is how command and control works: sleep, check in, take orders, sleep again. The destination changes, the rhythm stays. Score the rhythm and you find the beacon without ever knowing the indicator, and you find it the same way whether the traffic lands in Microsoft Defender or in Splunk.

This post shows how to score connection cadence, the thresholds that make it actionable, and the same hunt written for both platforms with a portable Sigma first pass in front of it.

What regularity looks like in the data

Take every connection from one host to one destination and line them up in time. Subtract each connection time from the one before it and you have a list of gaps. A human browsing, or an application syncing on demand, produces gaps that land all over the place: a few seconds here, twenty minutes there, an hour while the laptop is asleep. A beacon produces gaps that pile up around a single number, the sleep interval, give or take whatever jitter the operator configured.

You can eyeball that on one host. It does not scale to a fleet. The metric that does is the coefficient of variation: the standard deviation of the gaps divided by their mean. Expressing the spread as a fraction of the average is what makes it comparable across beacons that sleep for thirty seconds and beacons that sleep for six hours. A 60 second beacon with a 6 second spread and a 6 hour beacon with a 36 minute spread both score a coefficient of variation of 0.1. Standard deviation alone would rank them wildly apart and bury the slow one, which is the one you are least likely to have noticed by hand.

Low coefficient of variation over enough connections is the signal. Everything after that is tuning.

Scoring the cadence in Sentinel

In Defender advanced hunting, DeviceNetworkEvents carries successful outbound connections with the device, the remote address, and the timestamp, which is everything the calculation needs. Order by host, destination, and time, walk the rows computing the gap from the previous connection in the same group, then collapse the gaps into a count, a mean, and a standard deviation per pair.

// Hypothesis: an implant on a fixed sleep interval produces connection gaps
// that cluster around one value, so the coefficient of variation (stdev / mean
// of the gaps) stays low across many connections. Bursty human and app traffic
// scatters and scores high. We rank by that regularity, not by any indicator.
DeviceNetworkEvents
| where Timestamp > ago(7d)
| where ActionType == "ConnectionSuccess"
| where RemoteIPType == "Public"               // drop internal east-west traffic
| project Timestamp, DeviceId, DeviceName, RemoteIP, RemoteUrl
| order by DeviceId asc, RemoteIP asc, Timestamp asc
| serialize
| extend prevTime = iff(DeviceId == prev(DeviceId) and RemoteIP == prev(RemoteIP),
                        prev(Timestamp), datetime(null))
| where isnotnull(prevTime)
| extend gapSeconds = datetime_diff('second', Timestamp, prevTime)
| summarize connections = count(),
            meanGap  = avg(gapSeconds),
            stdevGap = stdev(gapSeconds),
            firstSeen = min(Timestamp),
            lastSeen  = max(Timestamp)
        by DeviceId, DeviceName, RemoteIP, RemoteUrl
| extend cv = round(stdevGap / meanGap, 3)
| where connections >= 20 and meanGap >= 30 and cv < 0.1   // sample floor, ignore chatty sessions, regular only
| sort by cv asc

serialize is what lets prev() reach back to the row above, and the iff guard stops a gap being measured across the boundary between one destination and the next. The three numbers at the bottom are the whole detection. connections is your sample size, meanGap is the implant's sleep interval in seconds, and cv is the regularity. The filter keeps pairs with at least 20 connections, a mean gap of 30 seconds or more so you are not flagging a single chatty TCP session, and a coefficient of variation under 0.1.

Twenty connections is the floor, not a comfortable sample. Below roughly 20 to 30 data points the coefficient of variation is too easily set by chance, so treat anything near the floor as a lead to confirm rather than a finding. Keep the window at seven days, because a beacon sleeping for an hour gives you only 24 connections a day, and a long-haul implant on a 12 hour sleep needs most of a week to clear the floor at all.

The same hunt in Splunk

Splunk gets there with streamstats, which carries a running value across events in order. Point it at firewall or proxy data in the Network Traffic model, sort by source, destination, and time, and have it hand each event the timestamp of the previous one for the same pair. The rest is identical arithmetic.

index=firewall sourcetype=pan:traffic action=allowed
| where NOT (cidrmatch("10.0.0.0/8", dest) OR cidrmatch("172.16.0.0/12", dest) OR cidrmatch("192.168.0.0/16", dest))
| sort 0 src dest _time
| streamstats current=f window=1 last(_time) as prev_time by src dest
| eval gap = _time - prev_time
| where isnotnull(gap)
| stats count as connections avg(gap) as mean_gap stdev(gap) as stdev_gap by src dest
| eval cv = round(stdev_gap / mean_gap, 3)
| where connections >= 20 AND mean_gap >= 30 AND cv < 0.1
| sort cv

current=f tells streamstats to exclude the current event so last() returns the prior connection rather than this one, and window=1 limits it to the single preceding event. _time is epoch seconds, so the subtraction gives the gap directly. The cidrmatch clauses drop internal traffic before the grouping, which sharpens the result and saves the cluster a pile of work. The thresholds carry over unchanged, because the coefficient of variation is unitless: 0.1 means the same thing in Splunk as it does in Defender, and a beacon scored in both should land on the same value within rounding. If you run Enterprise Security, the same logic belongs in a search-driven correlation that writes a notable, with the mean gap and the coefficient of variation carried into the event so the analyst reads the cadence without rerunning the math.

A portable first pass with Sigma

Sigma is the natural place to ask for this rule in a backend-agnostic form, with one honest limit: classic Sigma matches fields on single events and cannot compute a variance across a series. What it can do, through correlation rules, is count events in a window, which gives you the coarse precondition a beacon has to satisfy before any cadence test is worth running.

# Sigma cannot compute the variance itself, so this correlation rule is the
# portable coarse pass: many connections from one host to one external
# destination inside a window. It nominates candidates; the coefficient-of-
# variation query above confirms the cadence. Tune timespan and count to your
# shortest expected sleep interval.
title: Repeated Outbound Connections to a Single Destination
name: outbound_conn_single_dest
status: experimental
logsource:
    category: firewall
detection:
    selection:
        action: allowed
    filter_internal:
        dest|cidr:
            - '10.0.0.0/8'
            - '172.16.0.0/12'
            - '192.168.0.0/16'
    condition: selection and not filter_internal
---
title: Possible C2 Beaconing by Connection Volume
status: experimental
correlation:
    type: event_count
    rules:
        - outbound_conn_single_dest
    group-by:
        - src
        - dest
    timespan: 1h
    condition:
        gte: 30

This nominates any host talking to a single external destination more than 30 times an hour and hands that candidate list to the query that does the real scoring. On its own it is deliberately loud. Used as the front of a two-stage hunt, where Sigma narrows the field and the KQL or SPL coefficient of variation confirms the rhythm, it ports the precondition across every backend Sigma compiles to without pretending the statistics live somewhere they do not.

Where this goes wrong, and how to keep it honest

The first time you run this you will not find Cobalt Strike. You will find your patch agent, your EDR sensor checking in, NTP, certificate revocation lookups, telemetry from half the software on the endpoint, and whatever RMM tool your IT team uses. All of them beacon in the literal sense: a fixed destination on a fixed schedule with a near-perfect coefficient of variation. Cadence alone cannot separate them from an implant, and any writeup that claims otherwise has not run this in production.

Tightening the coefficient of variation until the noise stops is the wrong move, because it throws away the slow and jittered beacons you most want to catch. Instead, treat low coefficient of variation as the thing that narrows a fleet's worth of traffic down to a few hundred regular talkers, then confirm with the context the cadence cannot give you. Is the destination new to this environment in the last 30 days. Is the initiating process one that has any business making external connections. Does the domain resolve to something with no history, or sit on a young certificate. The cadence finds the candidates, reputation and first-seen return the verdict. Build the allowlist from the legitimate beacons you confirm, keyed on initiating process and destination, rather than by suppressing the signal itself.

Then there is jitter. Operators randomize the sleep to break exactly this analysis, and against a single fixed threshold it works. Cobalt Strike applies jitter as a percentage of the sleep, spread roughly evenly, so a 30 percent setting lands the coefficient of variation near 0.17 and a 50 percent setting near 0.29. A threshold pinned at 0.1 still sees the zero-jitter beacons, which remain common, and misses the rest. Run two bands instead: a high-confidence sweep under 0.1 that needs little enrichment before it reaches an analyst, and a wider hunt out to about 0.3 that you always enrich first. Past a coefficient of variation of roughly 0.3 the period is too buried for this method, and you move to frequency-domain analysis or a purpose-built tool like RITA that scores beacons on more than the spread of their gaps.

What to do this week

Run the Sentinel query across seven days against your own network events, sort by coefficient of variation ascending, and read the top of the list. You will recognize most of it. The job is to recognize all of it.
Build the allowlist from what you recognize, keyed on initiating process and destination, and write it down. The pairs you cannot explain are your hunt.
Add the second band. Re-run with the coefficient of variation opened to 0.3 and enrich the new rows with destination age and initiating process before you judge them.
If you run Splunk, port the search and confirm the numbers agree. The coefficient of variation is unitless, so the same beacon should score the same in both tools.
Stand up the Sigma correlation as a cheap precondition feed if you want the hunt to survive a SIEM migration, and keep the scoring query as the confirmation stage behind it.

References

MITRE ATT&CK, Application Layer Protocol (T1071)
Active Countermeasures, RITA (open-source beacon analysis)
SigmaHQ, correlation rules specification
Red Canary, Atomic Red Team (T1071.001 Web Protocols)
Microsoft, DeviceNetworkEvents schema
Splunk, streamstats command reference

Catch C2 Beaconing by Its Cadence in Sentinel and Splunk

What regularity looks like in the data

Scoring the cadence in Sentinel

The same hunt in Splunk

A portable first pass with Sigma

Where this goes wrong, and how to keep it honest

What to do this week

References

Related Articles

One Failed Login Is Noise. The Same Failure Across Sixty Accounts Is a Spray Your Rule Can't Count.

Your Sigma Rule Converts Cleanly and Still Never Fires. Here's the Test That Catches It.

Detecting Malicious Scheduled Tasks: The Persistence That Survives Your Cleanup