The not so great zero challenge

13.10.2008 21:27

The Great Zero Challenge tries to dispel the myth that you can recover any data from a hard drive that has been intentionally erased by overwriting its contents once with a single stream of zeros.

It's a nice idea with a problem. If it is possible to recover data, it's probably hugely expensive and it's highly unlikely that any company that is capable of doing that would take the challenge for a (recently increased) prize of $500. The money amount itself would probably be irrelevant if there would be a large media company backing it so that there would be some warranty of positive media coverage. Still, I congratulate the organizers for betting their money in order to dispel untruths as they put it.

Anyway, when I first heard of the challenge some time ago I noticed something interesting. The challenge says that you must identify the name of at least one of the files or folders on the disk. What if it would be possible to win the challenge without even touching the disk? Here's a enlarged portion of one of the censored screenshots available on the challenge website:

Sloppy censor, exhibit 1

Notice the dotted line? The censor has been a bit sloppy. That's one side of the selection box. So it looks like the folder was selected in the Explorer when the screenshot was made. This gives you a pretty good idea of the length of the folder name. More specifically, since the line is dotted, you know the width of the box is 62 or 63 pixels.

Windows uses a proportional font for file names (and by the looks of the screenshot they used the default Windows XP theme), so that further reduces the number of possible filenames. With a couple of trial screenshots I measured the width of all English letters and a short C program tried all possible combinations that included only letters.

The result? From a dictionary of English words, 2091 names matched. Far too many to be useful for guessing the correct name.

So knowing the length of a string rendered in proportional fonts isn't enough without some kind of a context. Is there any more information available in the screenshot to narrow the possibilities even further?

Sloppy censor, exhibit 2

Take a look at what else haven't been censored on the screenshot. There is one file with .gz extension, one with .tar extension and a directory. This suggests a decompressed distribution of one of the open source programs (for example something like "linux-2.6.23.tar.gz", "linux-2.6.23.tar" and "linux-2.6.24"). Size of the .tar file confirms that since it's approximately twice the size of the .gz file - a typical compression ratio for ASCII text or source code. The modification date of the .tar file suggests this is a fairly old release from the late 2006.

So, now all I need to find is an open source software that had a tar.gz release in November or December 2006, was 4.862 kB in size and had around 10 characters in the filename. Is there an easily searchable database of open source software out there that has this information? I haven't found one yet, but the Great Zero Challenge sure looks much less formidable when you look at it this way. And you don't even need to dust off that old electron microscope.

Posted by Tomaž | Categories: Ideas

Comments

You could just recursively list whole kernel.org or ftp.arnes.si and check the timestamp and size on the listings.

Why do you think that they would use a dictionary word for this? Even assuming the names are only six characters long, using only letters and numbers, disregarding letter case, there are 2,176,782,336 combinations.

Posted by Nick

Nick, read the post again and you'll see that there are significantly less combinations possible because a proportional font is used, even if you don't assume the file names come from a dictionary or an open source distribution.

Of course, nothing says they weren't smart and only made it look like they used the first open source .tar.gz distribution they found, just to mislead an approach like this.

On the other hand, I think it's most likely they did just that.

Posted by Tomaž

I agree with Nick, a proportional font means you can't even be sure exactly how many characters there are in the string, and being not limited to English words, there are at least millions of possible names for the folder.

Also, "You also must publicly disclose in a reproducible manner the method(s) used to win the challenge." This was part of the conditions, and guessing from screenshots would not be considered data recovery.

Posted by DaCheetah

Add a new comment


(No HTML tags allowed. Separate paragraphs with a blank line.)