In this section

$DATA Attribute — Resident and Non-Resident

14 hours · Module 1 · Free

What you already know

Sections 1.2 and 1.3 covered the timestamp attributes ($SI and $FN). This section covers $DATA — where the file's actual content lives. Whether data is stored inside the MFT record (resident) or on disk clusters (non-resident) determines what you can recover from a deleted file and how you recover it.

Scenario

A deleted executable was found via its MFT record on an SSD with TRIM enabled. The data run list shows it occupied clusters 50,000-50,127 (512KB). Before attempting recovery, you check the volume bitmap: the clusters are marked as free. You read the raw cluster content and find all zeros. TRIM has erased the data. The file content is unrecoverable from disk. But the MFT record still contains the filename, timestamps, file size, and parent path. And if the file was under ~700 bytes, the content might be resident in the MFT record itself, untouched by TRIM.

Resident $DATA — content inside the MFT record

When a file's content is small enough to fit within the 1,024-byte MFT record alongside the header, $SI, $FN, and other attributes, NTFS stores the data directly as resident content. The threshold is approximately 700 bytes, varying with how much space other attributes consume. A file with a short filename (single $FN with namespace 0x03) has more room than one with a long filename (two $FN attributes consuming more bytes).

The resident $DATA attribute header has the standard fields (type 0x80, length, non-resident flag = 0), followed by the content size and content offset specific to resident attributes. The content offset (typically 0x18 from the attribute start) points to where the actual file data begins. The content size tells you exactly how many bytes are present.

This matters forensically for two reasons. First, resident content is physically part of the MFT record. When the file is deleted, the MFT entry is marked as free but the data remains in the record until the entry is reallocated for a new file. On a volume with many free MFT entries, deleted resident files can persist for weeks or months. Second, resident content is captured by any tool that extracts the MFT — KAPE's $MFT target, FTK Imager's MFT extraction, or even a raw copy. You don't need the full disk image to recover resident data; you only need the MFT.

To extract resident data from a raw MFT record: locate the $DATA attribute (type 0x80) in the attribute chain, confirm the non-resident flag is 0, read the content offset and content size from the attribute header, and copy the specified number of bytes from the content area. The extracted bytes are the literal file content — no decompression, no decoding, no cluster mapping required. For a text file, the bytes are the ASCII or UTF-8 text. For a batch script, the bytes are the script commands. MFTECmd does not export resident data content in its standard CSV output. Recovering resident content requires manual hex analysis — which is why the binary analysis skills from Section 1.1 are essential.

Small files that are commonly resident include batch scripts, configuration files, PowerShell scripts, registry exports, CSV snippets, small text notes, and short email drafts. In investigations, these are often the most interesting files. A malicious batch script, a C2 implant configuration, a text file listing targeted data. If deleted, their content may be fully recoverable from the MFT alone.

Resident $DATA — Hex Dump

── $DATA attribute at offset 0x160 in MFT record ──
0x0160   80 00 00 00 50 00 00 00  00 00 00 00 00 00 00 00   Type=$DATA Len=80B Resident=0 (yes)
0x0170   00 00 00 00 2A 00 00 00  18 00 00 00 00 00 00 00   ContentSize=42B ContentOffset=0x18
── Actual file content starts at 0x160 + 0x18 = 0x178 ──
0x0178   40 65 63 68 6F 20 6F 66  66 0D 0A 73 65 74 20 22   @echo off..set "  ◄ Batch script content
0x0188   43 32 3D 68 74 74 70 3A  2F 2F 31 39 32 2E 31 36   C2=http://192.16  ◄ C2 URL in deleted file
0x0198   38 2E 31 2E 31 30 30 3A  38 30 38 30 22 00 00 00   8.1.100:8080"

This is a deleted batch script (42 bytes) stored entirely within the MFT record. The non-resident flag at offset +0x08 is 0, confirming resident storage. The content at 0x178 contains the literal file data: a script setting a C2 URL variable. This content survived deletion because TRIM operates on data clusters, not MFT records. The MFT is always on a fixed region of the volume.

Anti-Pattern

Assuming deleted file content is unrecoverable on SSDs

TRIM zeroes non-resident data in freed clusters. But resident data (files under ~700 bytes) is stored in the MFT record, which occupies a fixed allocated region not subject to TRIM. Batch scripts, config files, small PowerShell scripts, text notes, CSV snippets — exactly the files most likely to contain investigation-relevant content — are often small enough to be resident. Even on SSDs, always check the MFT for resident data in deleted files.

Non-resident $DATA — data run encoding

When a file exceeds the resident threshold, NTFS allocates clusters on disk and stores a data run list in the $DATA attribute mapping the file's logical extent to physical clusters.

Data Run List — Decoded

Header  Length  Offset     Decoded
31      08      56 34 12   Run 1: 8 clusters starting at cluster 0x123456         ◄ Low nibble=1B len, high=3B offset
31      0A      00 01 00   Run 2: 10 clusters starting at 0x123456 + 0x100 = 0x123556  ◄ Offset is RELATIVE
21      04      80 02      Run 3: 4 clusters starting at 0x123556 + 0x280 = 0x1237D6
00                          End of run list
Total: 22 clusters (90,112 bytes at 4K cluster size) across 3 fragments

The header byte's low nibble specifies bytes for the length field, the high nibble specifies bytes for the offset. 0x31 means 1 byte for length, 3 bytes for offset. The offset in subsequent runs is signed and relative to the previous run's start. A 0x00 header terminates the list. For fragmented files, multiple runs describe non-contiguous cluster ranges.

The non-resident $DATA header contains several size fields critical for forensic analysis. The allocated size is rounded to cluster boundaries — the total disk space reserved for the file. The real size is the actual content in bytes. The initialized size tells you how much of the allocated space contains valid data. The difference between allocated and real size is cluster-level waste (typically less than one cluster for unfragmented files). The difference between real and initialized can reveal partially written files — if initialization was interrupted by a crash or termination, the uninitialized portion contains whatever was previously in those clusters.

MFTECmd reports all three sizes. When the allocated size is smaller than the real size (the opposite of normal), the file is NTFS-compressed. The data run list for compressed files contains sparse runs (zero-offset entries) representing space saved by compression. When examining a compressed file's data runs, the sparse runs are the saved space within each 16-cluster compression unit.

To recover file content from a non-resident deleted file: read the data run list from the $DATA attribute, calculate the physical cluster offsets, navigate to those clusters in the disk image, and extract the bytes. For unfragmented files (single run), this is a single contiguous read. For fragmented files, you must concatenate data from multiple non-contiguous cluster ranges. MFTECmd's detailed output includes the data run list, and tools like icat (from The Sleuth Kit) can extract file content directly from a forensic image using the MFT entry number.

Alternate Data Streams

NTFS supports multiple $DATA attributes per MFT record. Named $DATA attributes are Alternate Data Streams (ADS). An ADS is identified by a $DATA attribute (type 0x80) with a non-zero name length in the attribute header. The name follows the header in UTF-16LE.

To access an ADS from the command line: type filename:streamname or notepad filename:streamname. To list ADS: dir /r or PowerShell's Get-Item filename -Stream *. MFTECmd reports ADS as entries with the same MFT entry number but different $DATA names.

The most forensically significant ADS is Zone.Identifier: when a browser downloads a file, Windows writes a Zone.Identifier ADS containing the download URL and security zone (typically Zone 3 = Internet). This is the mechanism behind the "downloaded from the Internet" warning dialog. The Zone.Identifier persists when the file is copied, moved, or renamed within NTFS volumes. It contains fields including ZoneId (0=Local, 1=Intranet, 2=Trusted, 3=Internet, 4=Untrusted), ReferrerUrl (the page that initiated the download), and HostUrl (the direct download URL). This metadata is forensic evidence that a file was downloaded from a specific URL. Many users and even some investigators don't realize this metadata exists because it is invisible in normal file operations.

In the insider threat scenario, Zone.Identifier streams on files copied to USB reveal that files were originally downloaded from cloud storage before being staged locally. The HostUrl field may show the SharePoint or OneDrive URL the file was downloaded from, even though the file was later moved to a local directory and then to USB. Some malware strains delete the Zone.Identifier from their own executable after deployment to remove forensic evidence of their download origin and to evade SmartScreen protection.

Malware hiding uses ADS to store payloads invisibly. A file readme.txt:payload.exe has a full executable in a named stream. Explorer shows only the primary $DATA size, so the file appears to be a small text file. ADS scanning has improved in modern antivirus, but some tools still miss ADS payloads during routine scans. In the MFT, ADS appear as additional $DATA attributes on the same record. Walking the complete attribute chain catches them.

Recovery assessment by storage type

Investigation Principle

Recovery probability depends on three factors: residency (resident data survives in MFT records), storage type (HDD retains deleted cluster data until overwritten, SSD erases via TRIM), and time since deletion (more activity = more overwriting). For deleted files on SSDs, check the MFT for resident data first, then verify cluster content before assuming recoverability. Document the storage type and TRIM status in every recovery assessment.

Resident + deleted: High recovery probability. Content is in the MFT record until the entry is reallocated. Non-resident + deleted + HDD: Moderate. Clusters retain data until overwritten. Check the volume bitmap. Non-resident + deleted + SSD: Low. TRIM may have zeroed clusters. Verify actual cluster content. Fragmented + deleted: Partial recovery possible. Each cluster range must be checked independently.

Section 1.5 examines $INDEX_ROOT and $INDEX_ALLOCATION — the directory B-tree structure. The $I30 index entries and their slack space provide deleted filename evidence independent of the file's own MFT record.