In this section

$DATA, $INDEX, and Directory Analysis

14 hours · Module 1 · Free
What you already know
Section 1.2 covered the timestamp attributes ($SI and $FN). This section covers how NTFS stores file content ($DATA) and directory contents ($INDEX). Whether data is resident or non-resident determines recovery probability. Whether directory slack retains deleted entries determines whether you can prove a file existed in a directory.

Scenario

A deleted executable was found via its MFT record on an SSD with TRIM enabled. The data run list shows it occupied clusters 50,000-50,127 (512KB). Before attempting recovery, you check the volume bitmap: the clusters are marked as free. You read the raw cluster content and find all zeros. TRIM has erased the data. The file content is unrecoverable from disk. But the MFT record still contains the filename, timestamps, file size, and parent path. And if the file was under ~700 bytes, the content might be resident in the MFT record itself, untouched by TRIM.

Resident $DATA — content inside the MFT record

When a file's content is small enough to fit within the 1,024-byte MFT record alongside the header, $SI, $FN, and other attributes, NTFS stores the data directly as resident content. The threshold is approximately 700 bytes, varying with how much space other attributes consume. A file with a short filename (single $FN with namespace 0x03) has more room than one with a long filename (two $FN attributes consuming more bytes).

The resident $DATA attribute header has the standard fields (type 0x80, length, non-resident flag = 0), followed by the content size and content offset specific to resident attributes. The content offset (typically 0x18 from the attribute start) points to where the actual file data begins. The content size tells you exactly how many bytes are present.

This matters forensically for two reasons. First, resident content is physically part of the MFT record. When the file is deleted, the MFT entry is marked as free but the data remains in the record until the entry is reallocated for a new file. On a volume with many free MFT entries, deleted resident files can persist for weeks or months. Second, resident content is captured by any tool that extracts the MFT — KAPE's $MFT target, FTK Imager's MFT extraction, or even a raw copy. You don't need the full disk image to recover resident data; you only need the MFT.

To extract resident data from a raw MFT record: locate the $DATA attribute (type 0x80) in the attribute chain, confirm the non-resident flag is 0, read the content offset and content size from the attribute header, and copy the specified number of bytes from the content area. The extracted bytes are the literal file content — no decompression, no decoding, no cluster mapping required. For a text file, the bytes are the ASCII or UTF-8 text. For a batch script, the bytes are the script commands. MFTECmd does not export resident data content in its standard CSV output. Recovering resident content requires manual hex analysis — which is why the binary analysis skills from Section 1.1 are essential.

Small files that are commonly resident include batch scripts, configuration files, PowerShell scripts, registry exports, CSV snippets, small text notes, and short email drafts. In investigations, these are often the most interesting files. A malicious batch script, a C2 implant configuration, a text file listing targeted data. If deleted, their content may be fully recoverable from the MFT alone.

Resident $DATA — Hex Dump
── $DATA attribute at offset 0x160 in MFT record ──
0x0160   80 00 00 00 50 00 00 00  00 00 00 00 00 00 00 00   Type=$DATA Len=80B Resident=0 (yes)
0x0170   00 00 00 00 2A 00 00 00  18 00 00 00 00 00 00 00   ContentSize=42B ContentOffset=0x18
── Actual file content starts at 0x160 + 0x18 = 0x178 ──
0x0178   40 65 63 68 6F 20 6F 66  66 0D 0A 73 65 74 20 22   @echo off..set "  ◄ Batch script content
0x0188   43 32 3D 68 74 74 70 3A  2F 2F 31 39 32 2E 31 36   C2=http://192.16  C2 URL in deleted file
0x0198   38 2E 31 2E 31 30 30 3A  38 30 38 30 22 00 00 00   8.1.100:8080"

This is a deleted batch script (42 bytes) stored entirely within the MFT record. The non-resident flag at offset +0x08 is 0, confirming resident storage. The content at 0x178 contains the literal file data: a script setting a C2 URL variable. This content survived deletion because TRIM operates on data clusters, not MFT records. The MFT is always on a fixed region of the volume.

Anti-Pattern

Assuming deleted file content is unrecoverable on SSDs

TRIM zeroes non-resident data in freed clusters. But resident data (files under ~700 bytes) is stored in the MFT record, which occupies a fixed allocated region not subject to TRIM. Batch scripts, config files, small PowerShell scripts, text notes, CSV snippets — exactly the files most likely to contain investigation-relevant content — are often small enough to be resident. Even on SSDs, always check the MFT for resident data in deleted files.

Non-resident $DATA — data run encoding

When a file exceeds the resident threshold, NTFS allocates clusters on disk and stores a data run list in the $DATA attribute mapping the file's logical extent to physical clusters.

Data Run List — Decoded
Header  Length  Offset     Decoded
31      08      56 34 12   Run 1: 8 clusters starting at cluster 0x123456         ◄ Low nibble=1B len, high=3B offset
31      0A      00 01 00   Run 2: 10 clusters starting at 0x123456 + 0x100 = 0x123556  ◄ Offset is RELATIVE
21      04      80 02      Run 3: 4 clusters starting at 0x123556 + 0x280 = 0x1237D6
00                          End of run list
Total: 22 clusters (90,112 bytes at 4K cluster size) across 3 fragments

The header byte's low nibble specifies bytes for the length field, the high nibble specifies bytes for the offset. 0x31 means 1 byte for length, 3 bytes for offset. The offset in subsequent runs is signed and relative to the previous run's start. A 0x00 header terminates the list. For fragmented files, multiple runs describe non-contiguous cluster ranges.

The non-resident $DATA header contains several size fields critical for forensic analysis. The allocated size is rounded to cluster boundaries — the total disk space reserved for the file. The real size is the actual content in bytes. The initialized size tells you how much of the allocated space contains valid data. The difference between allocated and real size is cluster-level waste (typically less than one cluster for unfragmented files). The difference between real and initialized can reveal partially written files — if initialization was interrupted by a crash or termination, the uninitialized portion contains whatever was previously in those clusters.

MFTECmd reports all three sizes. When the allocated size is smaller than the real size (the opposite of normal), the file is NTFS-compressed. The data run list for compressed files contains sparse runs (zero-offset entries) representing space saved by compression. When examining a compressed file's data runs, the sparse runs are the saved space within each 16-cluster compression unit.

To recover file content from a non-resident deleted file: read the data run list from the $DATA attribute, calculate the physical cluster offsets, navigate to those clusters in the disk image, and extract the bytes. For unfragmented files (single run), this is a single contiguous read. For fragmented files, you must concatenate data from multiple non-contiguous cluster ranges. MFTECmd's detailed output includes the data run list, and tools like icat (from The Sleuth Kit) can extract file content directly from a forensic image using the MFT entry number.

Alternate Data Streams

NTFS supports multiple $DATA attributes per MFT record. Named $DATA attributes are Alternate Data Streams (ADS). An ADS is identified by a $DATA attribute (type 0x80) with a non-zero name length in the attribute header. The name follows the header in UTF-16LE.

To access an ADS from the command line: type filename:streamname or notepad filename:streamname. To list ADS: dir /r or PowerShell's Get-Item filename -Stream *. MFTECmd reports ADS as entries with the same MFT entry number but different $DATA names.

The most forensically significant ADS is Zone.Identifier: when a browser downloads a file, Windows writes a Zone.Identifier ADS containing the download URL and security zone (typically Zone 3 = Internet). This is the mechanism behind the "downloaded from the Internet" warning dialog. The Zone.Identifier persists when the file is copied, moved, or renamed within NTFS volumes. It contains fields including ZoneId (0=Local, 1=Intranet, 2=Trusted, 3=Internet, 4=Untrusted), ReferrerUrl (the page that initiated the download), and HostUrl (the direct download URL). This metadata is forensic evidence that a file was downloaded from a specific URL. Many users and even some investigators don't realize this metadata exists because it is invisible in normal file operations.

In the insider threat scenario, Zone.Identifier streams on files copied to USB reveal that files were originally downloaded from cloud storage before being staged locally. The HostUrl field may show the SharePoint or OneDrive URL the file was downloaded from, even though the file was later moved to a local directory and then to USB. Some malware strains delete the Zone.Identifier from their own executable after deployment to remove forensic evidence of their download origin and to evade SmartScreen protection.

Malware hiding uses ADS to store payloads invisibly. A file readme.txt:payload.exe has a full executable in a named stream. Explorer shows only the primary $DATA size, so the file appears to be a small text file. ADS scanning has improved in modern antivirus, but some tools still miss ADS payloads during routine scans. In the MFT, ADS appear as additional $DATA attributes on the same record. Walking the complete attribute chain catches them.

Recovery assessment by storage type

Investigation Principle

Recovery probability depends on three factors: residency (resident data survives in MFT records), storage type (HDD retains deleted cluster data until overwritten, SSD erases via TRIM), and time since deletion (more activity = more overwriting). For deleted files on SSDs, check the MFT for resident data first, then verify cluster content before assuming recoverability. Document the storage type and TRIM status in every recovery assessment.

Resident + deleted: High recovery probability. Content is in the MFT record until the entry is reallocated. Non-resident + deleted + HDD: Moderate. Clusters retain data until overwritten. Check the volume bitmap. Non-resident + deleted + SSD: Low. TRIM may have zeroed clusters. Verify actual cluster content. Fragmented + deleted: Partial recovery possible. Each cluster range must be checked independently.

Scenario

In the insider threat scenario (INC-NE-2026-0915), the departing engineer deleted files from a staging directory after copying them to USB. The MFT entries for those files were eventually reallocated. But the staging directory's $I30 slack still contains the filenames and timestamps of the deleted files, proving they existed in that directory. The directory's metadata survived even though the files' own metadata did not.

How NTFS stores directory contents

A directory in NTFS is an MFT record with $INDEX_ROOT and (for larger directories) $INDEX_ALLOCATION attributes. The directory does not contain files. It contains an index of $FILE_NAME copies organized as a B-tree sorted by filename. Each index entry is a complete copy of the child file's $FN data: MFT reference, parent reference, four $FN timestamps, file size, flags, and filename.

When you list a directory (dir, Explorer, Get-ChildItem), the system reads the directory's index attributes and extracts metadata from the index entries. It does not read each child file's MFT record — the index entries contain all the information needed for a directory listing (filename, size, timestamps). This is why large directory listings are fast: the data is pre-indexed in the parent directory.

The B-tree organization means index entries are sorted by filename. When NTFS adds a new file to a directory, it inserts the entry into the correct sorted position in the tree, potentially splitting nodes and rebalancing branches. When NTFS removes a file (deletion), it removes the entry from the tree and may merge nodes. Both operations leave forensic traces: insertions push existing entries to make room (potentially overwriting slack from earlier deletions), and removals leave gaps that become slack space.

For small directories (under ~15-20 files, depending on filename lengths), the entire index fits in $INDEX_ROOT, which is resident in the directory's MFT record. No $INDEX_ALLOCATION exists. For larger directories, $INDEX_ROOT becomes the root node of the B-tree, and child nodes are stored in $INDEX_ALLOCATION records: 4,096-byte blocks allocated in disk clusters.

The duplication of $FN data in both the file's MFT record and the parent directory's index is forensically valuable. If the file's MFT record has been reallocated, the directory's index entry may still contain the original filename and timestamps. If the index entry was removed (file deleted), the file's MFT record may still have the data. The two sources corroborate when both exist and provide backup when one is destroyed. This redundancy is not by design for forensics — it is an NTFS performance optimization that happens to create forensic opportunity.

$I30 slack — recovering deleted file evidence

Anti-Pattern

Assuming deleted files leave no trace in the parent directory

When a file is deleted, NTFS removes its index entry by reorganizing the B-tree. But the bytes that formed the deleted entry are not zeroed. They become slack space within the index record. On directories with infrequent file creation (archive folders, project directories, shared drives), deleted entries persist for months. Tools like INDXParse and MFTECmd carve these entries from slack, recovering filenames, timestamps, and MFT references for files whose own MFT records are gone.

When NTFS removes an index entry, it slides entries forward or restructures the B-tree, but never zeros the vacated bytes. The deleted entry's $FILE_NAME structure persists in the slack at the end of the index record or in place as partially overwritten data.

This is what a carved $I30 slack entry looks like in MFTECmd output:

MFTECmd — $I30 Slack Entry (Carved)
Source:     $I30 slack — parent directory MFT entry 14,882           ◄ Staging directory
FileName:   Manufacturing_Specs_2026.7z                              ◄ Deleted file recovered from slack
MFT Ref:    48,271 (seq 3)                                          ◄ MFT entry now reallocated (seq 5)
$FN Created: 2026-03-12 16:48:22.3456789
$FN Modified: 2026-03-12 16:48:44.1234567
Size:       47,812,608 bytes (45.6 MB)
Flags:      Archive

The file Manufacturing_Specs_2026.7z was deleted from the staging directory. Its MFT entry (48,271) was reallocated (current sequence is 5, the entry's sequence was 3). The file's own metadata is gone. But the directory's $I30 slack preserves the filename, creation timestamp, and 45.6MB size, proving a compressed archive was staged in that directory.

The persistence of $I30 slack entries depends on directory activity after the deletion. On a directory that doesn't experience frequent file creation (archive folders, completed project directories, restricted network shares), deleted entries can persist for months because no new entries are being inserted into the B-tree to overwrite the slack. On a temporary directory with constant file creation and deletion (like %TEMP% or browser cache directories), slack entries may be overwritten within hours as new files are continuously added and removed.

This makes $I30 analysis particularly valuable for investigating access to restricted or archival directories where file churn is low. The staging directory in an insider threat case, the restricted share in an access dispute, the project folder on a file server — these are exactly the directories where $I30 slack entries persist longest because they don't see frequent file creation.

The forensic value of $I30 slack extends beyond filenames and timestamps. Each carved entry also contains the file size at the time the directory entry was created, the file flags (Hidden, System, Archive), and the MFT reference with sequence number. The MFT reference lets you check whether the file's own MFT record is still available. If the sequence numbers match, you can recover the full MFT metadata. If they don't match, the $I30 entry is your last remaining evidence of that file's existence in that directory.

MFTECmd with the --de flag parses $I30 slack automatically. Specialized tools like INDXParse provide more granular control over the carving process and can recover partially overwritten entries that MFTECmd skips. For critical evidence, run both tools and compare results.

Index entry structure

Each index entry contains: the child's MFT reference (8 bytes: 6-byte entry number + 2-byte sequence number), entry length (2 bytes), $FN content length (2 bytes), flags (4 bytes, bit 0 = sub-node exists, bit 1 = last entry sentinel), and the embedded $FILE_NAME content. The $FN content is identical to the data in the child file's own $FN attribute: parent reference, four timestamps, sizes, flags, namespace, and filename.

The sequence number in the index entry's MFT reference is important for verification. When you find a carved $I30 entry for a deleted file and want to check whether the file's MFT record is still available, compare the sequence number in the index entry against the current sequence number of that MFT entry. If they match, the MFT record still belongs to the same file and you can recover its full metadata including $SI timestamps, $DATA attribute, and any ADS. If the current sequence is higher, the entry was reallocated and the file's MFT metadata is gone — but the $I30 entry itself is your evidence.

$I30 analysis should be performed on every investigation involving deleted files, even when MFT analysis has already recovered deleted entries. The two sources are complementary: the MFT record contains the full attribute set (timestamps, data, security descriptor), while the $I30 entry provides independent confirmation that the file existed in a specific directory. When both sources are available, they corroborate each other. When only one survives (the MFT entry was reallocated but the $I30 entry persists, or vice versa), the surviving source is your evidence.

Investigation Principle

$I30 slack provides evidence independent of the file's own MFT record. A filename carved from directory slack proves the file existed in that directory, even if the file's MFT entry has been reallocated to a completely different file. For insider threat investigations, this is often the evidence that survives the longest — the subject deletes the files, the MFT entries are reused, but the staging directory's index slack preserves the proof.

Next
Section 1.4 covers the practical workflow: extracting the MFT from evidence, parsing it with MFTECmd, loading output in Timeline Explorer, and building forensic timelines with nanosecond precision.
Unlock the Full Course See Full Course Agenda