Documentation & Tools →
Sign In
In this section

Why a SIEM Centralizes the Whole Estate for Detection and IR

Module 0

A user account at Northgate Engineering signed in from an address in a country the firm does not operate in. That is the alert on your screen at the start of a shift. The obvious next question is simple to ask and, without a SIEM, miserable to answer: after that sign-in, did the account do anything? Did it reach an endpoint, touch a file share, open a connection out to the internet, or query an internal name it has no business resolving?

To answer that the hard way, you open the identity portal to confirm the sign-in, then the endpoint tool to look for process activity on whatever machine the account uses, then the firewall console to check egress, then the web proxy, then the DNS logs.

Five systems, five logins, five different ways of writing down a timestamp, five retention windows that may or may not still hold the hour you care about. By the time you have stitched a rough story together by hand, the shift is half gone and you are not confident you saw everything.

A SIEM exists to delete that entire afternoon of work, and understanding why is the first thing worth your time in this course.

Before the SIEM, the estate was a set of islands

Every system in an organization keeps its own record of what it did. The VPN concentrator logs who connected and from where. The identity provider logs sign-ins and the result of each one. The endpoint agent logs processes as they start.

The firewall logs sessions in and out. The web proxy logs requests. The DNS resolver logs lookups. Each of these is genuinely useful on its own, and each one is an island.

The trouble is that an island only tells you about itself. The VPN log knows a connection happened; it has no idea what the account did once it was inside. The endpoint agent knows a process ran; it cannot tell you the account that started it had signed in from a strange country ninety seconds earlier.

Each system writes its own timestamps, often in its own timezone, with its own idea of what a username looks like. One calls the person tom.ashworth, another TASHWORTH, another an email address, another a long opaque identifier. The same human being appears under four names across four logs, and nothing automatically connects them.

So an investigation on islands is mostly manual reconciliation. You pull an export from one system, an export from another, line them up in a spreadsheet, guess at how the clocks relate, and try to decide whether two events are the same person before you can even ask whether they form an attack. The evidence exists, but it is scattered, inconsistent, and slow to assemble, and the slowness matters because an attacker is not waiting for you to finish.

Make that concrete. Tom Ashworth in the SOC picks up the strange sign-in. The identity portal shows the account as a user principal name, the endpoint console lists its activity under a flat Windows account name, and the VPN log records the connection under a third identifier tied to the network login.

Confirming that those three records describe one person is a small piece of research on its own, and it has to be repeated for every account the case touches.

Add that the firewall reports in one timezone and the proxy in another, and that one of the systems keeps only a week of history, and the investigation is bottlenecked by the cost of assembling evidence that already exists rather than by any shortage of it. That cost is exactly what a SIEM pays once, at collection time.

Everything arrives in one searchable place

A SIEM, which stands for Security Information and Event Management, changes the shape of the problem by collecting all of those island logs into a single system as they are produced. Splunk is the SIEM this course uses. The defining move is collection: the VPN, the identity provider, the endpoint agent, the firewall, the proxy, and the resolver all forward their records to Splunk continuously, and Splunk keeps them in one searchable record of the whole estate.

Three things follow from that, and each one removes a chunk of the manual work. The records now live in one place, so you ask one system instead of logging into six. They are reachable through one query language, so the question you ask of sign-ins is the same shape as the question you ask of firewall sessions, and you do not relearn a new console for every source.

And they are aligned to one clock, because Splunk stamps every event with a single consistent notion of time as it arrives, so events from different systems can finally be ordered against each other without you reconciling timezones by hand.

That last point is quieter than the other two and matters more than either. An attack is a sequence in time. If you cannot trust that an endpoint event and an identity event are ordered correctly, you cannot tell whether the process ran because of the sign-in or before it, and that ordering is often the whole case. A SIEM gives you a timeline you can trust, which is the precondition for every investigation you will run in this course.

Two further properties of collection are worth naming, because you will lean on them later. Collection is continuous and close to real time, so you are reading what happened minutes ago rather than waiting on an export someone has to generate by hand. And once a record has been forwarded into Splunk, it sits in a system the source host does not control.

An attacker who fully owns an endpoint can delete that machine's local logs, but the events the endpoint already sent onward have left their reach. The copy that decides the case is the one that got away from them, which is part of why a centralized record is far harder to quietly tamper with than any single host's own log.

Six islands, each in its own format and clock, collected into one searchable record VPN concentrator identity provider endpoint agent firewall web proxy DNS resolver each: own login, own format, own clock Splunk one record, one language, one clock one question reaches every source at once The evidence stops being scattered. The skill shifts from gathering it to asking the right question of it.

Collection is the defining move. Once every island forwards its records into one system, the afternoon of manual reconciliation collapses into a single query against a record that is already assembled and time-ordered.

Anti-Pattern

Working the consoles one at a time.

Faced with the opening alert, the habit trained by years of separate tools is to open the identity portal, then the endpoint console, then the firewall, and reconcile the three by hand. In an estate that already has a SIEM, that is wasted effort, and it quietly throws away the one advantage the SIEM exists to give you: a single timeline that orders events from different systems against each other. Ask the whole record one question rather than asking six systems the same question six times.

What Northgate actually feeds in

The estate you investigate in this course belongs to Northgate Engineering, an eight-hundred-person engineering firm with offices in Manchester and Birmingham and a hybrid setup that is typical of a mid-sized business. Identities live in Entra ID and sync from on-premises Active Directory. People reach internal systems over a VPN. Endpoints run an agent that reports process activity.

A firewall sits at the edge, a proxy handles web traffic, and an internal resolver answers DNS. Some workloads have moved to the cloud and some have not.

None of that is unusual, and that is the point. The sources Splunk collects map directly onto the layers of a real organization: identity, where people prove who they are; endpoints, where work actually happens; the network, which carries everything between them; the web and DNS, which record how systems reach out; and the cloud, where more of the estate moves every year.

Each layer answers a different question about an account or a host, and the SIEM is the one place where all of those answers sit side by side.

It is worth naming what each layer answers, because the questions differ and an investigation usually needs several of them at once. Identity tells you whether an authentication happened, whether it succeeded, from where, and with what strength of proof. The endpoint tells you what actually executed on a machine and what spawned it. The network tells you what connected to what, and how much data moved.

The web and DNS layers record how a host reached outward, which is often where the first sign of remote control or of data leaving the building appears. No single layer is enough to call an incident on its own, which is precisely why holding them together is the SIEM's reason to exist.

You do not need to memorize this map yet. The detail of which source lands where, and how to read each one, is the work of Module 1 and the modules after it. What matters right now is the shape: the firm is a collection of layers, every layer produces evidence, and Splunk holds all of it together.

When you investigate the strange sign-in from the opening, you are not choosing which island to visit. You are asking one record that already contains the identity event, the endpoint activity, and the network connection, and waiting for it to tell you whether they line up.

Example search illustration only, not run here
(index=azuread OR index=endpoint OR index=network) user=p.sharma
| sort _time
| table _time index action process_name dest_ip

One account, every source it touched, gathered onto one timeline. This is the question the opening alert really asks, and the SIEM is what turns it into a single search instead of a tour of six consoles.

Example output illustration only, 3 results
_time      index      action    process_name     dest_ip
08:14:02   azuread    success
08:17:41   endpoint             powershell.exe
08:19:55   network    allowed                    91.214.66.18

Why this is the defender's advantage

There is a structural reason this arrangement favors the defender, and it is worth making explicit because it shapes how you will think for the rest of the course. An attacker who compromises an account has to operate across those same layers to do anything useful. They sign in, so they touch identity. They run something, so they touch an endpoint.

They reach out for instructions or move data, so they touch the network. Each step they take is a step into a layer the SIEM is already watching.

Return to the opening alert to see the asymmetry in motion. The strange sign-in is one event in the identity layer. If the account then runs a tool on Tom's workstation, that is an event in the endpoint layer. If it opens a connection out to an address it has never contacted before, that is an event in the network layer.

To the attacker these are three separate successes in three separate systems, and they have little way of knowing which were recorded or where. To you they are three rows that share an account and fall inside the same few minutes, placed on one timeline in the order they happened. The attacker took three steps. You read one sequence.

The attacker sees their own actions one at a time, from inside a single layer, and they cannot easily tell what evidence each step is leaving behind in systems they do not control. You see the assembled trail. The defender who can query the whole estate at once holds a view of the incident that the attacker, working blind across separate systems, does not have. That asymmetry is the foundation of detection work, and the next sub is entirely about how to use it.

The attacker works blind across layers; the defender reads the assembled trail Attacker's view sees one action at a time cannot see what each step logs in systems they do not control working blind, layer by layer Defender's view identity endpoint network one query orders all three against a single clock reads the whole sequence at once

The asymmetry is the point. The attacker has to cross layers to act, and every crossing lands in a system the defender can query. The whole-estate view turns scattered actions into a single readable sequence.

The cost, and where the real work moves

This view is not free, and being honest about the cost tells you where the rest of the course spends its effort. Collecting every source into one system produces an enormous volume of events, the large majority of which are routine and benign. A SIEM does not hand you incidents. It hands you a haystack that happens to contain them, already gathered and time-ordered, and leaves the finding to you.

The scale is worth sitting with. A firm the size of Northgate generates millions of events a day across its sources, and on an ordinary day every one of them is benign: sign-ins succeed, processes start and exit, connections open and close. The malicious activity you are hunting, when it is present at all, is a tiny fraction of that total, buried in routine that resembles it closely.

Volume, then, is the defender's real adversary inside the SIEM, and the two subs that follow deal with exactly that problem from two directions: how to follow a single trail through the noise, and how to draw a line that separates the rare signal from everything sitting around it.

So the bottleneck moves. On islands, the hard part was getting the evidence at all: the logins, the exports, the reconciliation. With a SIEM, the evidence is already there, and the hard part becomes asking it the right question.

That is why this course spends most of its length on how to query well and how to tell a real signal from the noise around it, rather than on how to collect logs. The collection is assumed. The judgment is the skill, and judgment is what turns a record of the whole estate into an answer about a single account that signed in from the wrong place.

Holding the whole estate in one place is the premise everything else rests on. The next sub puts it to work by following a real intrusion as it crosses those layers, because an attack almost never stays in one of them, and seeing the trail it leaves is how you learn to read it. SPL0.4 takes that on.