In this section

How to Get the Most from the AWS Detection and IR Course

Module 0

Two paths, one skill

There are two ways to work through this course, and both reach the same skill. The difference is how much you set up before you start.

The first path uses nothing but this site. Every investigation runs against a prepared dataset, and the real query output is on the page. You read the evidence, run the query, and see the result without an AWS account, a credit card, or a single resource of your own. This path exists because the skill being taught is reading and querying evidence, and you can practice that against recorded evidence as well as live evidence. Most learners take this path, and it teaches the full method.

The second path stands the environment up in your own AWS account. The lab pack builds the Northgate Organization with infrastructure-as-code, runs the attacks, and produces the logs for you to investigate against your own CloudTrail. It costs a little real money and takes setup time, and it is the stronger path for anyone who will do this work for a living, because there is no substitute for watching your own actions appear in the log a few minutes later. Module 1 covers the setup; you do not need it before then.

What the lab pack actually contains is worth knowing up front so the path feels concrete. It is a set of templates that create the accounts and resources, a baseline generator that produces normal day-to-day activity so the logs are not suspiciously empty, and attack scripts that perform the same intrusions you investigate on the prepared path. You deploy it, let it generate a few days of activity, run an attack, and then query the result in your own account.

The running cost is small, on the order of a few dollars for the logging and storage while the lab is up, and the pack includes a teardown step so you can destroy everything when you are done and stop the meter. The point of the path is not the infrastructure. It is the moment you run an attack and then find it yourself in CloudTrail, which is the closest this gets to the real job.

Both paths converge on the same capability. The prepared-evidence path is self-contained; the own-account path adds the experience of generating the evidence yourself.

Pick one and commit to it for a module at a time. The worst approach is to skim without doing either, treating the queries as something to read rather than run. The method only sticks if you do the work. You can also move between the paths as you go, investigating a module on the prepared data first and then re-running it in your own account once the concepts click, which turns the second pass into confirmation rather than first contact.

The query surface

From Module 1 onward you investigate by writing queries. AWS stores CloudTrail in S3 and you would normally query it with Amazon Athena, which speaks SQL. This course gives you the same experience without the setup: a query surface built into the page that runs real SQL against the prepared evidence and returns a real result table.

You will see queries in two forms. Some are runnable: an editable block with a Run button that executes against the dataset and shows you the rows. Others are worked examples shown alongside their output, where the point is to read the query and the result together. The AWS Query Lab gives you a free area to write your own queries against the full dataset whenever you want to test an idea that the lesson did not cover.

The SQL you learn here is the SQL you would write in Athena against a real account, with the same field names and the same nested structure, so the skill transfers directly to the job.

You do not need to know SQL before you start. The early query modules build it from the first SELECT, and the course only ever uses the parts of SQL that earn their place in an investigation. If you already write SQL, you will spend your attention on the AWS-specific part: the shape of a CloudTrail record and how to reach the fields buried inside it.

It helps to know what the real version looks like, because the on-page surface mirrors it. In a live account, CloudTrail writes its events as compressed JSON files into an S3 bucket. Amazon Athena is a serverless query engine that runs SQL directly over those files: you define a table that maps to the CloudTrail schema, and then you query years of activity without standing up a database or moving the data anywhere.

You pay per query for the data scanned, and nothing runs when you are not querying. This is the standard way teams investigate CloudTrail at scale, and it is the model this course teaches.

The part that catches people out is not the SQL keywords. It is that a CloudTrail record is deeply nested. The identity behind a call is not a column called user; it is userIdentity, an object with the type, the ARN, the account, and, for an assumed role, a whole sessionContext describing which role was assumed and who assumed it.

CloudTrail Identity, nested

{
  "eventName": "RunInstances",
  "eventSource": "ec2.amazonaws.com",
  "userIdentity": {
    "type": "AssumedRole",
    "arn": "arn:aws:sts::333333333333:assumed-role/app-server-role/i-0a1b2c3d",
    "accountId": "333333333333",
    "sessionContext": {
      "sessionIssuer": {
        "type": "Role",
        "userName": "app-server-role",
        "arn": "arn:aws:iam::333333333333:role/app-server-role"
      }
    }
  }
}

Pulling the attacker's identity out of a record means reaching into that structure with dotted paths like `useridentity.arn` and `useridentity.sessioncontext.sessionissuer.username`. A large part of querying AWS evidence well is knowing where in the record each fact lives. The course builds that map field by field, so by the time you need the session issuer you already know why it is there.

Predict before you reveal

Most query results in the course are hidden until you ask for them. A result block shows the query and a button, not the answer. This is deliberate, and using it properly is the single habit that separates learners who finish able to investigate from learners who finish able to recognize an investigation.

Before you reveal a result, predict it. Say what you expect the query to return: how many rows, roughly, and what the interesting one will look like. Then reveal, and compare. When your prediction is right, you have understood the evidence. When it is wrong, you have found the exact gap in your understanding, and that gap is worth more than the correct answer, because it tells you which assumption to fix. An investigator who runs a query without a prediction is fishing. An investigator who predicts first is testing a hypothesis, which is the actual job.

Here is what that looks like in practice. Suppose a query counts a single identity's calls over a day, grouped by eventName and ordered by count. Before revealing it, form the expectation: a developer's busiest calls are probably reads, things like GetObject, DescribeInstances, AssumeRole, with the counts trailing off into a long tail of occasional writes. Hold that picture, then reveal. If the result matches, you have confirmed what normal looks like for that identity, which is exactly the baseline you need before you can call anything abnormal.

If instead the top of the list is ListBuckets, ListUsers, GetAccountAuthorizationDetails, a wall of enumeration this person never normally runs, the mismatch between your prediction and the result is the finding. You did not stumble onto it. You produced it, by knowing what you expected and noticing where reality diverged. That is the move the whole course trains, and the reveal button exists to make you practice it on every result.

Anti-Pattern

Revealing first, reading second.

Clicking Reveal before forming an expectation, then nodding along with the answer. It feels like learning because the result makes sense once you see it. It is not learning, because making sense of an answer you are shown is a different skill from producing it from evidence, and only the second one helps at 02:14 when there is no answer key.

Build the timeline as you go

Treat every investigation in the course the way you would treat a real one, by writing down what you find as you find it. The deliverable of an incident investigation is not a feeling that you understand what happened; it is a timeline, a sequence of timestamped, evidence-backed statements that another person can follow and trust. Start building that timeline from the first query, not at the end. Each time a query confirms something, add a line: the time, the identity, the action, the source, and the record that proves it.

This habit does more than produce a report. It keeps you honest about the difference between what the evidence shows and what you suspect. When you write "09:14, m.webb opened TCP 443, CloudTrail event ID such-and-such," you are stating a fact you can defend. When you catch yourself wanting to write "the attacker then exfiltrated the data" without a record to point at, you have found the boundary of what you actually know, and that is the next thing to go and prove or disprove.

Investigations go wrong when suspicion gets promoted to fact without anyone noticing. A timeline built from evidence as you go is the cheapest defense against that, and the course is structured so you are always assembling one.

A related habit is reading a result of zero rows as a finding rather than a dead end. When you expect a query to return something and it comes back empty, that emptiness is information. A sign-in with no matching multi-factor event, a key with no usage before the day it appeared from a strange IP, a trail with no events during the window an attacker was clearly active: each of these is an empty result that tells you something specific.

Beginners treat zero rows as a failed query and move on. A working investigator asks what the absence means, because in incident response what is missing is often the point. The course puts you in front of empty results deliberately so the instinct to interpret them, rather than discard them, becomes automatic.

How to work through the modules

Work the modules in order. The course is built as one continuous investigation that deepens, not a set of independent topics, and later modules assume the evidence and the environment you met earlier. Module 1 sets up the landscape and your first queries. The phases after it move through the attack the way an attacker does: identity first, then persistence and data, then compute and evasion, then the full response.

The capstone in Module 11 hands you the whole chain at once and expects you to run it without scaffolding, which is the point at which you find out the skill is yours.

Use the two block forms differently. When a block is runnable, run it, and change it: alter a value, add a condition, see what happens. The query surface is not fragile, and breaking a query and fixing it teaches more than reading a correct one. When a block is a worked example shown with its output, slow down and read both together, asking why the query is shaped the way it is before you accept the result.

And when a lesson sparks a question it did not answer, take it to the AWS Query Lab, which is a free-run space against the full dataset for exactly that. The learners who get the most from the course are the ones who treat every query as something to poke at rather than something to pass.

Each content section ends with a knowledge check built from the decisions in that module, framed as situations rather than definitions. Treat a wrong answer the same way you treat a wrong prediction: as the precise thing to go back and re-read. The pattern holds for every query you run. One you predicted teaches you something whether it confirms your expectation or surprises you, while one you only revealed teaches you nothing either way. Form the expectation first, every time.

AWS0.3 starts the real work of the orientation: why the control plane, the record of every API call, is the crime scene you investigate in AWS.

← Previous Next →