Python PublicSuffix release 1.1.0

10.07.2015 16:46

The publicsuffix Python module is one of the most widely used pieces of open source software that resulted from my years at Zemanta. With around 1000 daily downloads from the package index, it's probably only second to my Python port of Unidecode. I've retained the ownership of these packages even though I moved on to other things. Being mature pieces of specialized code, they are pretty low-maintenance, with typically one minor release per year and an occasional email conversation.

The publicsuffix 1.1.0 release had some interesting reasons behind it, and I think it's worth discussing them a bit here.

Since the start, publicsuffix shipped with a copy of Mozilla's Public Suffix List as part of the Python package. My initial intention was to make the module as easy to use as possible. As the example in the README showed, if you constructed the PublicSuffixList object without any arguments, it just worked, using the built-in data. The object supported loading arbitrary files, but this was only mentioned in the docstring. My reasoning at the time was that Public Suffix List was relatively slow-moving. For quick hacks it wouldn't matter if you used slightly out-dated data and serious users could pass in the exact version of the list they needed.

Later I started getting reports that other software started using publicsuffix with the assumption that the built-in data is always up-to-date. A recent example reported by Philippe Ombredanne was url-py. Some blame is definitely with authors of such software not reading the documentation. My README did have a warning about the default list being possibly out-dated, although that fact was probably not stressed enough. However, after reading complaints over time, I felt that perhaps publicsuffix could offer an interface that was less prone to errors like that.

I looked through the possibilities on how to improve things by looking at the quite numerous publicsuffix forks on GitHub and Public Suffix implementations in other languages.

The publicsuffix package with a built-in copy could be updated each time Mozilla updates the List. This would make the package decidedly less low-maintenance for me. It would also require that every software that uses publicsuffix regularly re-installs its dependencies and restarts to refresh the data used by PublicSuffix. From my experience with deploying server-side Python software, this is quite an unreasonable requirement.

Another approach would be to fetch the list automatically from Mozilla's servers at run-time. However, I don't like doing HTTP requests without this being explicitly visible to the developer. Such things have led to unintentional storms of traffic to unsuspecting sites. Also, for some uses, the latest version of the list from Mozilla's servers might not even be what they need (consider someone processing historical data).

Finally, I could simply say it's the user's responsibility to supply the algorithm with data and not provide any default. While this requires a bit more code, it also hopefully gives the developer a pause to think what version of the List is relevant to them.

The latter is the approach I went with in the end. I deprecated the argument-less PublicSuffixList constructor with a DeprecationWarning. I also added a convenience fetch() function that downloads the latest List from the web and the result of which can be directly passed to the constructor. This makes the request explicit (and explicit is better than implicit is, after all, in the Zen of Python) and only little less convenient. I opted against implementing any kind of caching in publicsuffix since I think that is something specific to each application.

In conclusion, you can't force people to carefully read the documentation. People will blame you for bugs in other people's software (incidentally, this release also works around a packaging problem due to a long-standing bug in Wheel). It's worth thinking about defaults and carefully picking what examples you show in the README.

Posted by Tomaž | Categories: Code

Add a new comment

(No HTML tags allowed. Separate paragraphs with a blank line.)