Cummulus backups

01.08.2009 20:55

The cloud computing is all the rage these days. However I'm kind of reluctant to trust some company in a country half way across the globe with all my personal data, even if they picture it floating in fluffy heaven. So in a kind of old-fashioned way, I still store all my private documents, photos and emails on hard disks that are my own property.

However, backing up my home server to an external USB drive has become a little bit inconvenient lately. So I decided to give it a go and try backing up to Amazon Simple Storage Service (S3) - i. e. the cloud.

My server is on a residential ADSL line with relatively poor upstream bandwidth (1 Mb/s), so incremental backups must use the bandwidth efficiently. There's approximately 4 GB of compressed data to be backed up, which is just on the limit of what I would consider feasible for a setup like this - theoretically it should take around 12 hours to upload a full snapshot, but I don't plan to do these very often.

After some investigation I decided to use Duplicity: first because it's advertised as efficient in exactly this use case and secondly because I already use it to backup my computer at work. Although the official man page is a little short on details about S3 storage, there are quite a few articles floating around.

The cost of S3 storage is pretty minimal: I never plan to store more than around 20 GB worth of backups and if I count in a monthly 4 GB full snapshot, that comes to $3.50 per month. Granted this is very expensive if you compare it to the price of an external USB drive, but it has the benefit of being off-site and conveniently accessible from anywhere on the internet.

Of course, the tiny paranoid voice in my head made me check all the worst case scenarios: If Amazon suddenly disappears from the face of the Earth, I would be left without backups. But I judge that the possibility of that happening and me needing the backups in the same instant is too low to worry about. The data I'm sending over the Atlantic is encrypted with GPG, so it's presumably safe even in the unlikely case someone in US would want to browse through my stuff pretending to be looking for terrorists or some such nonsense.

One problem I do see is that these backups are not safe in case someone breaks into my server, since they could be altered or erased by the attacker - but that's the case with most if not all automated backups. In addition to that, Sysadminman makes an interesting point that in case someone gets my Amazon credentials they can run up a huge bill in my name since it's not possible to put bandwidth limits on an account. That's not a possibility that would make me loose my sleep at night, but I did make a note to check occasionally my account activity.

Finally, how does this work in practice? A full backup takes 14 hours while an incremental one is finished in a little less than 15 minutes. One thing I still have on the to-do list is to look into Linux QoS settings and make some adjustments so I could still comfortably read my email over IMAP and the NTP client wouldn't panic once a month when the full backup is made.

So right now, after two weeks of use, it looks like I'll stick to this backup scheme. Still, it's nice to know I can cancel the service at any moment should any serious problems come up.

Posted by Tomaž | Categories: Code

Comments

I don't quite understand how duplicity works. If backup is stored in an encrypted tar file, then how does rsync compare deltas without unpacking it?

My bewilderment stemming of course from notions that even slight changes in stored files should create a bit-wise very different tar and that peaking remotely shouldn't be possible without private key.

Posted on 1 August 2009 by Marko

Each time you run duplicity, it actually creates two sets of tar files: one with the actual data and one with just file signatures. When you make an incremental backup, it first downloads the signature files, decrypts them and then backs up those parts of files whose signatures have changed.

So I imagine the signature files contain something like a MD5 hash of each 10 kB block of data in a file.

Signatures take around 50 MB for 4 GB of actual data in my case.

Posted on 2 August 2009 by Tomaž

Excelent post. I will get that service since I have less than 30GB of data and I dont want to carry a hard drive when I move back to Europe.

Posted on 3 August 2009 by Frankie Bloise

Add a new comment


(No HTML tags allowed. Separate paragraphs with a blank line.)