Recovering Files Removed With rm

Introduction

Have you ever removed a file with rm [FILENAME] or even worse rm -rf [DIR] only to find out that that you didn't have a backup?. Well, this happened to me with a project I hadn't pushed to Github, and could not recover despite my best efforts. Colloquially, I know that the file wasn't really deleted, it was just stored in unallocated spaces and the pointers from the file system data structures were no longer pointing to it, so the file was recoverable but I didn't know how to get it back. This post documents my journey into file system in Linux to recover files in this state. Before moving forward, please note that there is a difference between removing a file using rm and zeroing out the disk (dd if=/dev/zero of=/dev/sda). To my knowledge, data that's been zero wiped cannot be recovered by standard means (unless you are some sort of 3 letter government agency) - on magnetic disks you could probably do some magnetic ghosting. However, on most modern disks, this is (as far as I know) impossible.

Prerequisites

Linux Machine
ext4 File System
The Sluth Kit (TSK: https://www.sleuthkit.org/) sudo apt-get install sleuthkit

My Setup

I have a nvm disk and a default installation of Linux, which means I have two partitions, the boot partition and the rest of the disk is used as data blocks.

$ /Forensics/recovery lsblk 
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT 
nvme0n1     259:0    0   477G  0 disk 
├─nvme0n1p1 259:1    0   512M  0 part /boot/efi 
└─nvme0n1p2 259:2    0 476,4G  0 part /

And to see how it is mounted on the file system:

$ /Forensics/recovery df -h 
Filesystem      Size  Used Avail Use% Mounted on 
udev            7,8G     0  7,8G   0% /dev 
tmpfs           1,6G  2,0M  1,6G   1% /run 
/dev/nvme0n1p2  468G  292G  153G  66% /

Let’s Do It!

In Linux, the default fileystem is ext4, and most often than not, LVM is being used, so the abstraction model for a typical Debian based Linux system could be summarized as follows:

Disk (Physical) -> Partition (Logical) -> LVM -> Filesystem (Ext4) -> Block Data

That’s how a disk should look in your mind, but to actual OS, it doesn’t care about anything but iNodes: paths and directories are irrelevant — all it cares about is the disk and the iNode number, you can view the iNode of any file or directory by running ls -i

$ /Forensics/recovery ls -i 
total 16 
24126926 drwxrwxr-x 2 danielftapiar danielftapiar 4096 may 24 12:18 . 
24126166 drwxrwxr-x 5 danielftapiar danielftapiar 4096 may 24 12:15 .. 
24126928 -rw-rw-r-- 1 danielftapiar danielftapiar   48 may 24 12:18 veryImportantFile2.txt 
24126927 -rw-rw-r-- 1 danielftapiar danielftapiar   58 may 24 12:16 veryImportantFile.txt 
 
$ /Forensics/recovery cat veryImportantFile.txt 
This has some important data that cannot ever be deleted!

As you can see, we have iNode numbers for the current directory at 24126926 and the following 2 files as consecutive numbers of 24126927 and 24126928 . This is important later on.

Every block size on a default ext4 system is 4096[Kb]

TSK: The Sleuth Kit

TSK is a suite for forensic analysis on a file system, it will be used across this experiment to view the internals of the disk and file system, First let’s get some information on our current file system and disk /dev/nvme0n1p2 (yours could be different).

$ sudo su 
$ fsstat /dev/nvme0n1p2 
 
File System Type: Ext4 
Volume Name: 
Volume ID: 130baa837c7eea8de64ea08eca2e1ab9 
 
Last Written at: 2020-05-21 22:57:26 (-04) 
Last Checked at: 2019-09-17 23:56:46 (-03) 
 
Last Mounted at: 2020-05-21 22:57:26 (-04) 
Unmounted properly 
Last mounted on: / 
 
Source OS: Linux 
Dynamic Structure 
Compat Features: Journal, Ext Attributes, Resize Inode, Dir Index 
InCompat Features: Filetype, Needs Recovery, Extents, 64bit, Flexible Block Groups, 
Read Only Compat Features: Sparse Super, Large File, Huge File, Extra Inode Size 
 
Journal ID: 00 
Journal Inode: 8 
 
METADATA INFORMATION 
-------------------------------------------- 
Inode Range: 1 - 31227905 
Root Directory: 2 
Free Inodes: 30133037 
Inode Size: 256 
Orphan Inodes: 14326396, 14316784, 24641539, 14291343, 23199801, 14330335, 23199879, 14325278, 14320613, 14294096, 14290435, 14292001, 15337327, 23199747, 
 
CONTENT INFORMATION 
-------------------------------------------- 
Block Groups Per Flex Group: 16 
Block Range: 0 - 124895487 
Block Size: 4096 
Free Blocks: 53805922 
 
BLOCK GROUP INFORMATION 
-------------------------------------------- 
Number of Block Groups: 3812 
Inodes per group: 8192 
Blocks per group: 32768 
 
Group: 0: 
  Block Group Flags: [INODE_ZEROED] 
  Inode Range: 1 - 8192 
  Block Range: 0 - 32767 
  Layout: 
    Super Block: 0 - 0 
    Group Descriptor Table: 1 - 60 
    Group Descriptor Growth Blocks: 61 - 1084 
    Data bitmap: 1085 - 1085 
    Inode bitmap: 1101 - 1101 
    Inode Table: 1117 - 1628 
    Data Blocks: 9309 - 32767 
  Free Inodes: 8175 (99%) 
  Free Blocks: 9124 (27%) 
  Total Directories: 2 
  Stored Checksum: 0x4CF0 
 
  ...

This gives us a low level overview of the file system, a Block size of 4096 [Kb] how many free blocks are remaining, iNodes per group (8192) and blocks per groups (32768)

The theory behind data recovery is that the files aren’t really deleted, just the pointers from the file system data structures. The deleted blocks that accommodated the files are still there, but are now marked as UNALLOCATED - which means that it is available to be overwritten by incoming writes to disk. Therefore, it is very important if you just noticed that you've deleted critical data then you should stop all writes to the target system. This means: kill all processes that are writing new data to disk or the most, turning off the target machine.

The technique I’ll be using is disk carving, which means that I will use dd as a sort of scalpel to carve out sections of the disk that we know was the previous location of our lost data. So in our previous example we had iNode numbers 24126926, 24126927, 24126928, it has to be mapped to the correct Block Group number, to figure this out we need the iNode range (8192) and the iNode number of the deleted files.

$ echo $((   24126926 / 8192 )) 
2945

So around Group Block 2945, we should have our iNode range, let’s verify:

$ fsstat /dev/nvme0n1p2 | grep "Group: 2945" -A12 
Group: 2945: 
  Block Group Flags: [INODE_ZEROED] 
  Inode Range: 24125441 - 24133632 
  Block Range: 96501760 - 96534527 
  Layout: 
    Data bitmap: 96468993 - 96468993 
    Inode bitmap: 96469009 - 96469009 
    Inode Table: 96469536 - 96470047 
    Data Blocks: 96501760 - 96534527 
  Free Inodes: 6679 (81%) 
  Free Blocks: 0 (0%) 
  Total Directories: 129 
  Stored Checksum: 0x1FEF

Our iNodes are in the range, which means that we can get our data from this block, run dd as follows:

$ dd if=/dev/nvme0n1p2 bs=4096 skip=96501760 count=32767 of=./images/bg2945.raw    
32767+0 records in 
32767+0 records out 
134213632 bytes (134 MB, 128 MiB) copied, 0,214837 s, 625 MB/s

bs : block group size (4096 [Kb])
skip : number of block groups to skip (Group 2945 starts at 96501760)
count : number of bytes from offset (96534527 - 96501760, eg: The entire block group)
if : input file, from which file (in this case block device), to use as input
of : output file, where to store the image

And now looking for strings in this image, we should see our current file,

$ root@earth:/Forensics/recovery# strings images/bg2945.raw | grep "This has some important data that cannot ever be deleted!" 
  This has some important data that cannot ever be deleted!

So we carved out the file, that is present on the filesystem but it was never deleted, if it is deleted now, then run the same chain of commands we can recover the file.

$ rm veryImportantFile.txt 
$ dd if=/dev/nvme0n1p2 bs=4096 skip=96501760 count=32767 of=./images/bg2945.raw    
$ strings images/bg2945.raw | grep "This has some important data that cannot ever be deleted!" 
This has some important data that cannot ever be deleted!

And boom! The file is in the image and you can view the contents to your heart’s content by using dd and outputting it to a file.

Magic numbers

To view the contents of your file depends a lot on the type of file that was deleted and how you carve it, Because carving tools such as dd do not rely on the file system, they need other sources of information to discover where a file starts and ends. Fortunately, many file types have known structures. The header and footer are often all that is needed to identify the file type and location. The Linux file command also uses header and footer information to identify file types. Using dd in the same manner as before but targeting the header and footer of your missing file and outputing that to a file is all that is needed. The header or signature of these blocks are called magic numbers and they help identify the file type, and how long the block is.

Images for example have known magic numbers, JPEG starts with byte sequence 0xFFD8 and often followed by 0xFFE00010 where as word documents start with 0x504B0304 along with 0x14000600 . You can view the list here

Odd Cases

There are times when the block is carved and the file isn’t there, most of the time the file system places the files in the same block as the parent directory. If not, it will attempt to place it in the adjacent block and so forth, so when carving the file system with dd instead of targeting count with the block size you might want to multiply that value by however many blocks you want to look forward, so count=BLOCKSIZE*NUMBER_OF_BLOCKS and look for your file there.

There are other times when the disk is full and the file is placed at the very edges of the disk. This happened to me when I was actually trying to dump an image on my file system and it was bigger than I expected and in turn used up 100% of the disk. Thus, any new files were placed anywhere the file system could fit it. the adjacent block placement didn’t work in this case. I had to resort to create a random file using touch then executing istat to get the Direct Block it was created.

$ istat /dev/nvme0n1p2 24126929 
inode: 24126929 
Allocated 
Group: 2945 
Generation Id: 754060831 
uid / gid: 1000 / 1000 
mode: rrw-rw-r-- 
Flags: Extents, 
size: 58 
num of links: 1 
 
Inode Times: 
Accessed:       2020-05-24 12:16:52.796495865 (-04) 
File Modified:  2020-05-24 12:16:49.344420627 (-04) 
Inode Modified: 2020-05-24 12:16:49.344420627 (-04) 
File Created:   2020-05-24 12:16:14.503635323 (-04) 
 
Direct Blocks: 
57818842

In this case Direct Block: 57818842, which is way farther than my previous attempt. This gives me a ball park estimate as to where to start the carving with dd to get to a recently created file that was deleted on a file system that had had its file system filled.