Documentation & Tools →
Sign In
In this section

The Northgate Engineering Estate: Sources, Evidence, and Reading It

Module 0

You cannot investigate an environment you do not understand, and you cannot tell normal from abnormal in a place you have never seen behave normally. Every detection you write in this course is, underneath, a statement about what is usual for one specific organization and what is not. So before any of the searching begins, you need to know that organization: who works there, what it runs, and where its activity is recorded.

That organization is Northgate Engineering, and it stays fixed for the whole course. The same firm, the same people, and the same attacker thread through every module, so the knowledge you build about it in one investigation pays off in the next. This sub introduces the estate as a place with evidence, and one habit that separates a reliable investigator from a hopeful one: looking at the raw evidence before writing the search that finds it.

The organization you will investigate

Northgate Engineering is a mid-sized engineering firm of roughly eight hundred and ten staff, headquartered in Manchester with a second site in Birmingham. It runs Microsoft 365 E5 with Entra ID for identity, syncing from an on-premises Active Directory, so a single person often has both a cloud identity and a domain account.

It manages around eight hundred and sixty-five endpoints and a dozen servers: six Red Hat application and database hosts, two Ubuntu web servers, and the supporting infrastructure a firm this size accumulates. Its perimeter is a Palo Alto firewall. Remote workers come in through a Cisco VPN concentrator, outbound web traffic passes through a Squid proxy, and a slice of the business has moved into AWS.

None of that inventory is exotic, and that is deliberate. Northgate looks like thousands of real mid-sized organizations, which is what makes the skills you build against it transfer to the place you actually defend. You are not learning to investigate a pristine reference build designed to make detections easy. You are learning to investigate a working business with the messiness a working business has.

A handful of people recur across the investigations, and it is worth meeting them now, because an alert means more when you know whose account is in play. Rachel Okafor is the CISO. Phil Greaves runs IT. Marcus Webb is the security architect. Tom Ashworth and Priya Sharma sit on the first line of the SOC, and they are the analysts whose shoes you are often standing in.

Elena Petrova handles governance and risk. When a sign-in for p.sharma lands at the top of a suspicious result, you will recognize it as Priya's account rather than a string of characters, and that recognition is part of how real analysts reason about an incident.

The Northgate estate, and every layer forwarding its record into Splunk Entra ID + AD identity ~865 endpoints 12 servers (RHEL, Ubuntu) firewall, VPN, proxy AWS account Splunk the record of the estate The cast you follow Rachel Okafor, CISO Phil Greaves, IT director Marcus Webb, architect Tom Ashworth, SOC L1 Priya Sharma, SOC L1 Elena Petrova, GRC the same firm, start to finish

A realistic mid-sized estate, not a reference lab. Identity, endpoints, servers, the network edge, and a cloud account all forward their records into Splunk, where the people and the attacker stay consistent from the first investigation to the capstone.

Northgate also has the weaknesses a real firm has, and the attacker in this course exploits exactly those. A VPN exposed to the internet. A Conditional Access exclusion someone added for a contractor and forgot. A service account with more reach than anyone now remembers granting. None of these is a configuration error invented to make the course work. They are the ordinary debt that accumulates in any estate that has been running for years, and they are where intrusions begin.

One feature of Northgate's setup is worth flagging now, because it shapes many investigations. Identity is hybrid, so a single person frequently has two representations: a cloud identity in Entra and a domain account on-premises, and the two do not always carry the same name.

An investigation that follows only the cloud identity can lose the trail the moment the attacker acts through the on-premises account, and the reverse is just as true. Holding both halves of a person's identity in mind, and knowing they belong to one human, is a recurring demand of working a hybrid estate. Northgate is hybrid precisely so that you practice it.

What each source can tell you, and what it cannot

Each layer of the estate records a different kind of activity, answers a different question, and stays silent about things outside its view. Knowing the shape of each source before you query it is what stops you from asking a log for an answer it never held.

Identity

Entra ID sign-in & audit logs

Records who authenticated, from where, and whether it succeeded.

Answers was this sign-in genuine, and was MFA satisfied?

Blind to anything the account did once it was through the door.

Endpoint

host agent: process, file & registry events

Records processes starting, what spawned them, and the files and keys they touched.

Answers what actually executed on this machine?

Blind to anything the agent was not configured to capture.

Network

firewall, VPN & proxy

Records what connected to what, and how much data crossed.

Answers where did command-and-control or exfiltration surface?

Blind to what happened inside an encrypted session.

Web & DNS

proxy requests & name lookups

Records the sites a host requested and the names it resolved.

Answers how, and to where, did a host reach outward?

Blind to traffic that never passed the proxy or resolver.

Cloud

AWS control-plane activity

Records administrative and access actions across the cloud estate.

Answers who changed or reached what in AWS?

Blind to on-premises activity, which lands in the other layers.

The point of seeing them as a set is that no single source closes a case. Identity can prove a strange sign-in happened and prove nothing about its consequences. The endpoint can show a suspicious process and know nothing of the sign-in that preceded it. The investigation lives in carrying a finding from one of these sources into the next, which is the cross-source habit of SPL0.4 made concrete against a real estate.

What makes that carrying possible is that the sources share identifiers, even while they record different things. An account name appears in both the identity and the endpoint records. A host appears in both the endpoint and the network records. An address appears across the network and the web records.

These shared fields are the seams along which you stitch the sources back together, and much of learning an estate is learning which identifier reliably links which pair of sources, since the names are not always written the same way in each.

Splunk holds a copy, not the original

There is a property of all of this that decides how far you can trust your own evidence, and it is worth stating plainly because new analysts often assume the opposite. Splunk does not watch the estate directly. Each source produces its own record and forwards a copy of that record into Splunk, which keeps it faithfully. Splunk is an excellent librarian of what it was sent. It is not a witness to what it was not.

The consequence is that the quality of your evidence is decided at the source, before Splunk ever sees it. If a host was configured to log process creation, you can investigate what ran on it in detail. If that logging was switched off or never enabled, the events simply do not exist, and no query, however clever, can recover them, because there is nothing in the record to find.

This is the same gap you would meet on-premises, only centralized: a blind spot at a source becomes a blind spot in the SIEM. Part of reading evidence well is knowing what each source should have produced, so that an absence registers as a finding in its own right rather than as a clean result.

A host that suddenly stops logging in the middle of an incident is telling you something, and you only hear it if you knew what its normal volume looked like.

A concrete version makes the stakes clear. Suppose an attacker runs a tool on a workstation whose process logging was quietly switched off weeks earlier by a misapplied policy. In the identity records you see the suspicious sign-in. In the network records you see the connection out.

Between them, where the endpoint evidence of what actually executed should sit, there is silence, and that silence is not proof the attacker did nothing on the host. It is the absence of a source, and reading it as reassurance is one of the easier ways to close a real incident by mistake.

The evidence is set at the source, before Splunk sees it the source decides what it records, and what it never logs Splunk keeps a faithful copy of what it was sent the blind spot what the source never logged is gone for good A query can only find what reached Splunk. Knowing what a source should have produced turns an absence into a finding.

Splunk is a faithful librarian, not a witness. Because the record is only as complete as what each source chose to log, an investigator has to know each source well enough to notice when something that should be there is missing.

Anti-Pattern

Reading an empty result as an all-clear.

A search that returns nothing can mean the activity never happened, or it can mean the source that would have recorded it was not logging. Those are opposite conclusions. Before you treat a clean result as reassurance, confirm the source should have produced the evidence at all: a host that went quiet, an agent that was disabled, a channel that was never enabled. An absence you can explain is a finding. An absence you assume is a blind spot you walked straight past.

Read the evidence before the query

The single habit this course drills from the first module to the capstone is to look at the raw evidence before writing the search that finds it. It is tempting to start from a clever query, but a query you write before you have read the underlying events is a guess about a format you have not confirmed, and guesses are how detections quietly match the wrong thing.

There is a second reason the habit pays. The same word can mean different things in different sources: one log's notion of a user is an account name, another's is an email address, a third's is a numeric identifier, and a field that carries the same label in two sources can hold two different concepts.

Reading the raw event tells you what a field actually contains here, in this source, rather than what you assumed it contained, and that small confirmation heads off a whole class of searches that run cleanly and return the wrong rows. Here is one raw line from Northgate's VPN, one of many during the password-spray incident you will work in the detection modules.

Cisco ASA VPN raw log line
%ASA-6-113005: AAA user authentication Rejected : reason = AAA failure : server = 10.0.1.20 : user = p.sharma : user IP = 193.32.162.89

Read what it says before reaching for anything else. An authentication was rejected. The account was p.sharma. The attempt came from 193.32.162.89, an external address. It was aimed at the VPN. That is the entire content of the event, and on its own it is noise: people fail their own passwords all day, and a single rejection proves nothing.

The signal is not in this line. It is in the relationship between many lines like it. One external address rejected against one user is a typo. One external address rejected against fifty different users in a short window is a spray, because a real person fails their own password, not fifty other people's.

That shift, from the single event to the aggregate pattern, is the whole of the detection, and notice that you reasoned your way to it by reading the evidence rather than by knowing a command. The search that counts distinct users per source address and flags the ones far above what any real user produces is short, and you will write and tune it for real in the detection modules.

Example search illustration only, not run here
index=network sourcetype=cisco:asa Cisco_ASA_message_id=113005
| stats dc(user) AS users_targeted, count AS attempts BY src_ip
| where users_targeted > 20
| sort - users_targeted

One source address rejected against many distinct users is the shape of a spray. Reading the evidence told you what to count; the search only expresses it. You build and tune this version for real in the detection modules.

Example output illustration only, 2 results
src_ip          users_targeted   attempts
193.32.162.89   47               312
45.135.232.18   24               96
What matters here, before the syntax, is the move: read the raw event to learn what is in it, decide what pattern separates the attack from the noise, and only then express that pattern as a query. The events come first. The SPL is how you ask the question they taught you to ask.

The chains live in this estate

The seven attack chains introduced in SPL0.4 are not described to you from outside. They are embedded in Northgate's evidence, scattered across the very sources just described. The password spray sits in the VPN and identity records. The token replay runs through identity and the proxy. The endpoint compromise is in process and service activity, with its command-and-control in the network and DNS records.

The ransomware staging, the hybrid pivot, the edge intrusion, and the full-span capstone each leave their trail across this same estate. You meet each one where the course needs it, as a real sequence to reconstruct from the sources you now know.

That is the orientation complete on the substance side. You know why the SIEM holds the whole estate, why attacks cross its sources, why detection is the work of separating signal from noise, and now the specific estate you will work in and the evidence it holds. SPL0.7 closes the module on the practical side: the search surface on the page, how a live block behaves, and how to confirm you are ready before Module 1.