Exponential backup rotation

17.01.2015 12:10

Log2rotate (as in "log2 rotate", not "log to rotate") is an interesting algorithm and command line utility created by Chris Forno. It is a common task to create periodical snapshots of some system, either for backup or archival purposes. To keep the storage requirement down, you also often want keep a more dense series of backups for some short period of time and retain fewer and fewer as they age.

Typically this is done based on common time units, like having a daily, weekly, monthly and yearly cycles of backup rotation. Log2rotate approaches this problem from a more mathematically elegant standpoint. Instead of basing cycles on ways humans measure time, it tries to keep the intervals between stored backups increasing exponentially. Specifically, it keeps each interval between two backups about twice as long as the previous one.

This is not as trivial as it might sound at first. You are creating backups at a steady pace, but once you delete them, they are gone. You cannot rearrange them later. So the algorithm needs to periodically remove snapshots in a smart way so that the intervals approximate the geometric series.

Visualization of the log2rotate algorithm.

In the plot above you can see the visualization of the log2rotate algorithm. Each black point represents one stored backup. All black points on a vertical line represent backups that have been stored on that day. A horizontal line shows the life time of a single backup. The algorithm always preserves the newest (on the diagonal) and the oldest (bottom row) backups, while the ones in between get removed in a fractal-like pattern.

Number of stored backups versus time when using log2rotate.

Because the algorithm cannot resurrect deleted backups, it can only approximate the geometric series. In practice, the number of stored backups varies between log2 n and 2·log2 n as shown on the graph above.

Chris writes that this rotation scheme saves 50% of space over the traditional daily-weekly-monthly-yearly schedule. On the other hand, this might be too few backups for some use cases. Unfortunately the algorithm is not tunable. At least as far as I can see, it only works for the exponent two. This might be related to the fact that the interval between three sequential backups naturally doubles when you delete the middle one. But to be honest I don't really understand even why in this particular case Chris' algorithm works as well as it does.

The original implementation is in Haskell. While a functional language fits well with the nature of this algorithm, it's a bit inconvenient to integrate with other code. I also wanted to do a few additions to the command line tool and my knowledge of Haskell is too limited to be able to enhance the original code.

Therefore, in the truest of software development traditions, I wrote my own implementation in Python. It currently lacks documentation, but the command-line interface should be completely compatible with the original, except that it now supports a couple of additional arguments. It also contains a Python class that can be imported from other Python code and can be used to apply the algorithm to arbitrary objects, as long as they can be sorted and have a defined distance between them.

$ git clone https://github.com/avian2/pylog2rotate.git

If this sounds useful, get it from GitHub repository above. As always, comments and patches are most welcome.

Posted by Tomaž | Categories: Code | Comments »


10.01.2015 23:00

I guess too many things happen in a year to be able to mark the whole as good or bad with good conscience. For me, 2014 certainly had had its ups and downs. Looking back at it though, I can't avoid the feeling that the latter were more abundant and that it's been overall a difficult year to weather.

My life in 2014 has been dominated by a bunch of personal problems that surfaced almost exactly at the start of the year. I only wish I could say they are completely behind me. I am not comfortable discussing them here, except to say that I am immensely grateful for the help and support I received, despite the worries and grief I caused. Foremost from my parents and also from professional counselors I met over the past twelve months.

I have completed, even if just barely catching the deadline, the first year of a postgraduate programme at the Jožef Stefan International Postgraduate School. Although I have been encouraged by my employer and others to pursue a PhD, I am not that optimistic that I will complete the study. After being a passive observer of the school for two years and a student for one I'm not convinced that I can make a meaningful contribution to science in this environment. I feel I am too old to sign attendance at lectures, study a topic only for the sake of passing an exam and giving seminar presentations to empty rooms.

Last year has been the time when I completed what is probably my most elaborate electronics project yet. The new UHF receiver for VESNA has been an interesting challenge, a welcome refreshment of some parts of electrical engineering I haven't touched in many years and provided a wonderful sense of achievement when everything fell in place. Still, looking at it now, the success is bitter sweet. It's been work done in an isolated garden. I've done all electronics design, firmware and higher level software development myself and it's too little too late. What I made is a product of European funding and the ecosystem at the Institute. Outside these walls it has precious little advantages over something as mediocre as a rtl-sdr stick in a Raspberry Pi.

Given the above, it's no wonder that 2014 has been kind of dry season for technical side projects. I always tended to have some piled on my table, but this time few deserve mention. I've spent perhaps the most time playing with CubieTruck, getting Debian and a web server running smoothly on it and discovering some of its whims. jsonmerge was another project that had me occupied beyond the hackday that sparked it. While it turned out less generally useful than I initially thought, it was a nice excuse to update my Python skills somewhat.

Speaking of Python, visiting Berlin in summer for EuroPython has definitely been one of the highlights of the year. As was manning the hardware hacking table at the WebCamp Ljubljana.

It's hard to sum up 2014. In some way it's been full of serious work and worries that have at times removed the joy of working in my profession. On the other hand, it's been also about facing problems that went unaddressed for too long and doing things that I might not have imagined doing a few years back. It's hard to say which of them has been a distraction from which. It has left me wishing for change and wanting back times when I could spend evenings hacking away on some fun project without deadlines attached.

Posted by Tomaž | Categories: Life | Comments »