Quantcast
Channel: User Kamil Maciorowski - Super User
Viewing all articles
Browse latest Browse all 837

Answer by Kamil Maciorowski for How do I recover lost/inaccessible data from my storage device?

$
0
0

In case you want to recover one or few text files with partially known content

If the file you want to recover is a plain text file (as Linux understands it, i.e. UTF-8) and the filesystem where the file used to be is/was neither encrypted nor compressed, in Linux use strings on the block device (partition) holding the filesystem.

For each file given, GNU strings prints the printable character sequences that are at least 4 characters long (or the number given with the options below) and are followed by an unprintable character.

(source: man 1 strings)

You want something like:

strings -aw -e S -n 10 /dev/sdX1 >/another/filesystem/extracted

(or pv /dev/sdX1 | strings -aw -e S -n 10 >/another/filesystem/extracted to see the progress).

Then extracted will be a text file you can view with less, search with grep etc. In my tests -e S was crucial to detect UTF-8 text with multi-byte characters.

Notes:

  • -n 10 tells the tool to print sequences at least 10 bytes long. The manual says "characters" but my tests with UTF-8 multi-byte characters show it's "bytes" for sure. The lower the number, the more garbage you will get. On the other hand you should not exceed the block size used by the source filesystem, which is at least 512 (the lowest common sector size for block devices). The point is your file may be fragmented and -n higher than the block size will miss a textual block, if it happens to be between non-textual data. If your file was tiny (smaller than -n you used) then you might miss it completely. Similarly you may miss the tail part of your desired file, if the part happens not to be adjacent to other text.

  • extracted will probably be relatively huge anyway, too big for "manual" inspection. You will probably need to use a good text editor or a pager (capable of handling large text files) to interactively search for the string you know was in the file you want to recover. Or use grep (possibly with -A, -B; see man 1 grep) to search for the string. This way you will hopefully locate the relevant fragment of extracted.

  • The file you're after may be fragmented, scattered, not necessarily in sequence. In extracted there may be old versions, there may be fragments of other files (garbage, including text-alike fragments of binary files); all these possibly interleaved. extracted as a whole will be a textual jigsaw puzzle. Consider using the -s (--output-separator) option of strings, but keep in mind if there are unrelated fragments strictly adjacent in the filesystem then you won't get a separator between them, as if they were one bigger chunk.

  • If the filesystem you're trying to recover data from is on SSD and TRIM was performed after the mishap in which you lost the file, then there's a risk the content of the file is gone. This is a bad scenario.

    On the other hand, if the filesystem is on SSD and TRIM was performed before the mishap, and there was no TRIM later, then the TRIM may have wiped out unrelated old data, old versions of files etc., but not the content of the file you're after. In effect you will get less garbage from strings. This is a good scenario.

    As you can see, SSD may be a disadvantage or an advantage. For HDD these scenarios do not apply. Virtual disks may support something similar to TRIM.

  • In the beginning I wrote "the filesystem […] neither encrypted nor compressed". An encrypted or compressed filesystem would store textual data not in its plain form, so strings would be useless. I guess some other features of some filesystems may lower your chances or cause some extra garbage.

  • If you have enough RAM, consider copying extracted to /dev/shm (or use vmtouch -l) to speed up your work with grep or something.

  • The whole idea requires a string you know was in the file. Using the name of the file as a known string won't help you locate the content because in general filenames and actual data are stored separately, not necessarily near each other. This observation leads us to a preemptive strategy (i.e. in advance, before any mishap) that can make your important text files more prone to be recovered by our method after a future mishap.

    Let's suppose you want to store a SerialKey for VeryImportantSoftware in a text file. The key is J7f9e7sc. Do not store the key only, build the text file like this:

    SerialKey for VeryImportantSoftware: J7f9e7sc  

    In case you ever need to recover this file and you decide to use our method with strings, search for SerialKey and/or VeryImportantSoftware, or even for SerialKey for VeryImportantSoftware if you remember this is the exact string.


Viewing all articles
Browse latest Browse all 837

Trending Articles