Exponential backup rotation

17.01.2015 12:10

Log2rotate (as in "log2 rotate", not "log to rotate") is an interesting algorithm and command line utility created by Chris Forno. It is a common task to create periodical snapshots of some system, either for backup or archival purposes. To keep the storage requirement down, you also often want keep a more dense series of backups for some short period of time and retain fewer and fewer as they age.

Typically this is done based on common time units, like having a daily, weekly, monthly and yearly cycles of backup rotation. Log2rotate approaches this problem from a more mathematically elegant standpoint. Instead of basing cycles on ways humans measure time, it tries to keep the intervals between stored backups increasing exponentially. Specifically, it keeps each interval between two backups about twice as long as the previous one.

This is not as trivial as it might sound at first. You are creating backups at a steady pace, but once you delete them, they are gone. You cannot rearrange them later. So the algorithm needs to periodically remove snapshots in a smart way so that the intervals approximate the geometric series.

Visualization of the log2rotate algorithm.

In the plot above you can see the visualization of the log2rotate algorithm. Each black point represents one stored backup. All black points on a vertical line represent backups that have been stored on that day. A horizontal line shows the life time of a single backup. The algorithm always preserves the newest (on the diagonal) and the oldest (bottom row) backups, while the ones in between get removed in a fractal-like pattern.

Number of stored backups versus time when using log2rotate.

Because the algorithm cannot resurrect deleted backups, it can only approximate the geometric series. In practice, the number of stored backups varies between log2 n and 2·log2 n as shown on the graph above.

Chris writes that this rotation scheme saves 50% of space over the traditional daily-weekly-monthly-yearly schedule. On the other hand, this might be too few backups for some use cases. Unfortunately the algorithm is not tunable. At least as far as I can see, it only works for the exponent two. This might be related to the fact that the interval between three sequential backups naturally doubles when you delete the middle one. But to be honest I don't really understand even why in this particular case Chris' algorithm works as well as it does.


The original implementation is in Haskell. While a functional language fits well with the nature of this algorithm, it's a bit inconvenient to integrate with other code. I also wanted to do a few additions to the command line tool and my knowledge of Haskell is too limited to be able to enhance the original code.

Therefore, in the truest of software development traditions, I wrote my own implementation in Python. It currently lacks documentation, but the command-line interface should be completely compatible with the original, except that it now supports a couple of additional arguments. It also contains a Python class that can be imported from other Python code and can be used to apply the algorithm to arbitrary objects, as long as they can be sorted and have a defined distance between them.

$ git clone https://github.com/avian2/pylog2rotate.git

If this sounds useful, get it from GitHub repository above. As always, comments and patches are most welcome.

Posted by Tomaž | Categories: Code

Comments

That is what came to my mind years ago, when asking our admin if i can get older backups. It turns out, that it possible to get logN() for any N, but I guess that only small integers like 2 are interesting. I wonder if it possible to get rational N.

Posted by Luka Rahne

Add a new comment


(No HTML tags allowed. Separate paragraphs with a blank line.)