In this section

Memory Forensics Confidence Tiers and Reporting Language

Module 0

A report where everything is certain has already failed

A memory forensics report that states every finding with the same confidence has lost before it is read. Real findings are not equally strong. Some are supported by several independent discovery methods and verified against raw memory, those deserve flat assertions. Some rest on a single plugin's output with plausible alternatives not yet ruled out, those deserve hedged language. Some are suggestive but uncorroborated, those deserve careful framing that claims only what the evidence supports.

The reason this matters is adversarial, not academic. An analyst who skips confidence assessment produces a report where the weakest finding tarnishes the strongest: opposing counsel finds the one overclaim, demonstrates it, and uses it to question the entire methodology. The defense against that is to state confidence explicitly, finding by finding, so that no claim exceeds its evidence and the strong findings are insulated from the weak. This sub gives you the three-tier hierarchy every module applies, the modifiers that move a finding between tiers, and the reporting language each tier earns.

The tier is not a label you slap on at the end; it is the output of the modifiers. A finding earns High by acquiring independent support and verification, and loses a tier the moment an alternative explanation goes unaddressed or an anti-forensic threat is in play.

High confidence: multiple methods, verified

A High-confidence finding is one supported by more than one independent discovery method and verified against raw memory. The process that appears in pslist, psscan, and pstree, with its EPROCESS confirmed by reading the structure directly, is High: the kernel maintained it, several independent paths reach it, and you have seen the bytes yourself. High-confidence findings earn direct assertions in the report, present tense, no hedging: "at acquisition the process was running, owned by this user, with these open handles and these network connections." You can make that claim flatly because the evidence is redundant and verified, and redundancy is what survives the question "how do you know the tool didn't get it wrong?"

The volatile-structure category from MF0.2 lives mostly here, precisely because it offers multiple discovery methods and the kernel cannot suppress all of them at once. High confidence is the target tier for any finding that will carry the weight of the conclusion, and the work of phases 3 and 4 in the workflow, reconciliation and raw verification, is exactly the work that earns it. It helps to see what "multiple methods" actually buys you. A single plugin reporting a process is one assertion from one piece of code reading one structure. The same process confirmed by pslist, psscan, and a thread-based enumeration is the same fact reached by three independent paths through different structures, so for the finding to be wrong, all three would have to fail in the same direction at once, which is far less likely than one parser having a bad day. Raw verification adds a fourth, human, path: you read the bytes yourself and confirm they are what the tool claimed. High confidence is not a feeling of certainty, it is this specific, statable redundancy, and stating it is what lets you assert the finding flatly and defend the assertion when asked how you know.

What "multiple independent methods" looks like for one process

# PID 4872 reached by three independent paths:
windows.pslist   -> 4872 powershell.exe   (active list)
windows.psscan   -> 4872 powershell.exe   (pool scan)
windows.threads  -> owner 4872            (thread back-ref)
# + EPROCESS bytes read by hand -> HIGH: assert it flatly

Medium confidence: single method, alternatives ruled out

A Medium-confidence finding rests on a single discovery method, but the obvious alternative explanations have been considered and excluded. A credential recovered from LSASS is often Medium: one extraction path, and its meaning depends on configuration you have to reason about, was WDigest enabled, was this credential cached by a real logon or left from an earlier session? When you have ruled out the alternatives you can identify but cannot reach the redundancy of High, the finding is Medium, and it earns hedged language: "consistent with," "likely," "supports the conclusion that." That phrasing is not weakness, it is calibration. It tells the reader exactly how far the evidence reaches, and it is unshakeable under cross-examination because you have already stated the limit yourself.

The discipline that makes a finding legitimately Medium rather than secretly Low is ruling out alternatives in writing. "This is consistent with credential theft" is defensible only if you have recorded which other explanations you considered and why the evidence does not fit them. An un-reasoned "consistent with" is a Low finding wearing Medium language, and an opposing expert will expose the gap.

The discipline of ruling out alternatives is also where most honest analytical work happens, so it deserves a concrete shape. For the LSASS credential, the alternatives worth excluding might be: the credential was cached by a legitimate interactive logon rather than harvested, the artifact is a stale remnant from a session that has since ended, or the value is an artifact of how the tool parses the structure rather than a real cached secret. Each of those can be argued for or against using other evidence, the logon events, the session timing, a raw read of the structure, and a Medium finding is one where you have done that arguing and recorded it. The point is that Medium is not "less sure so I will hedge"; it is "one discovery path, alternatives considered and excluded in writing," which is a precise and defensible position rather than a vague one.

Low confidence: suggestive, not conclusive

A Low-confidence finding is suggestive but uncorroborated: a single fragile indicator, an artifact with several equally plausible explanations, a pattern that might be attack or might be benign. Low findings are not worthless, they are leads, and they belong in a report framed as exactly that: "may indicate," "warrants further investigation," "is one possible explanation among several." What destroys credibility is dressing a Low finding in High language, asserting as fact something the evidence only hints at. State it as the lead it is, and it costs you nothing; state it as a conclusion, and it becomes the overreach that opposing counsel uses to question everything else. Low findings also have real investigative value precisely as leads, which is why you record them rather than discard them: the suspicious heap string that is Low on its own may become the thread that, pulled in a later phase, connects to a High finding, and a finding's tier is not fixed at first sight. A Low indicator that subsequently gains a second discovery method, survives raw verification, or correlates with an independent source is promoted, with the promotion and its reason recorded, and that movement is the normal arc of an investigation: leads accumulate support and rise, or fail to and stay flagged as leads. The discipline is to let the evidence drive the tier in both directions, never to let the conviction you have formed drive it upward on its own.

The language is not a courtesy; it maps each claim onto exactly what the evidence supports. An overclaim, High language on a Medium finding, is the gap an opposing expert pushes on, which is why the tier and the wording are decided together.

The modifiers move a finding between tiers

The tier is the result of four modifiers, and knowing them lets you both assign confidence honestly and see how to raise it. A finding moves up when you add a second independent discovery method, when you verify it against raw memory rather than trusting a parse, and when you correlate it with an independent source like a log or disk artifact. It moves down when a plausible alternative explanation has not been addressed, when the evidence sits in a structure an active anti-forensic technique is known to manipulate, and when it rests on a single fragile source. This is why the workflow is built the way it is: reconciliation adds discovery methods, phase 4 adds raw verification, and phase 5 adds correlation, each one a modifier that pushes findings toward High.

Same case, three tiers, three grades of language. The conclusion rests on the High row; the Low row is flagged as a lead so that conceding it under challenge costs nothing. Tiering decides in advance which findings you defend to the last and which you give up freely.

Worked across three findings from the running NE-FIN-014 example: the C2 connection from windows.netscan, confirmed by the firewall log, is High, two independent sources, kernel-maintained structure, verified. The decoded PowerShell command from windows.cmdline is Medium-to-High, decisive and verified in memory but memory-only, so the report notes that disk holds the encoded form and memory is the source of the decoded form. A single suspicious string found in a process's heap with no other context is Low, one fragile indicator with many innocent explanations, reported as a lead for further work. Same investigation, three honest tiers, and the High finding carries the conclusion while the Low one is clearly flagged as not load-bearing. That structure is what protects the report as a whole. Because the conclusion rests on the High finding and the Low finding is labeled a lead, an opposing expert who attacks the weak string-in-heap finding gains nothing: you have already said it is suggestive and not load-bearing, so conceding it costs you nothing and the conclusion stands on the corroborated C2 evidence regardless. Compare the alternative, where all three are stated with equal confidence: now the attackable Low finding sits next to the conclusion as if it supported it, and discrediting it appears to discredit the whole. Tiering is, in the end, a way of deciding in advance which findings you are willing to defend to the last and which you will calmly give up, so that the giving-up never threatens the core.

Where analysts get it wrong

Letting confidence drift upward as the investigation proceeds, so a lead that entered as "may indicate" is being asserted as fact by the conclusion, with no new evidence in between. Confidence is a property of the evidence, not of how attached you have become to the theory. The tell is language inflation across the report: the same finding described as "possible C2" on page three and "the attacker's command channel" on page nine. If the evidence did not change, the language must not either. Assign the tier when you make the finding, write it down, and make the report's language match the tier you recorded, not the conviction you have developed since.

Why this is foundational

Every subsequent module produces findings, and every one of those findings will carry an explicit confidence tier and the reasoning behind it. That is not extra paperwork layered on top of the analysis, it is the analysis stated honestly. A finding you cannot assign a tier to is a finding you do not yet understand well enough to report, and the act of choosing the tier forces the questions, how many methods reached this, did I verify it, what alternatives exist, that separate a forensic conclusion from a hunch. In that sense the tier is less a label you append than a checklist you are forced to complete: you cannot honestly write "High" without having reconciled methods and verified raw memory, and the discipline of assigning the tier is therefore the discipline of doing the work that earns it. The framework makes shortcuts visible to you before it makes them visible to anyone reviewing the report.

The confidence hierarchy and the reporting language it dictates are also the bridge to the next sub. The reason calibrated language matters so much is that memory evidence frequently ends up in front of people whose job is to attack it: opposing counsel, an insurer's expert, a tribunal. The next sub takes that head-on, the legal context for memory evidence, and the procedural and reporting discipline that keeps an investigation defensible from the moment acquisition begins.

Next section

0.8: Legal context for memory evidence

The evidentiary standards memory evidence faces, the chain-of-custody and best-evidence considerations specific to a volatile image, and the expert-reporting conventions that survive cross-examination.

← Previous Next →

Reading width