Dropping the "publicsuffix" Python package

02.12.2019 10:50

I have just released version 1.1.1 of the publicsuffix Python package. Baring any major bugs that would affect some popular software package using it, this will be the last release. I've released v1.1.1 because I received a report that a bug in publicsuffix package is preventing installation of GNU Mailman.

In the grand scheme of things, it's not a big deal. It's a small library with a modest number of users. I haven't done any work, short of answering mail about it, since 2015. Drop-in alternatives exist. People that care strongly about the issues I cover below have most likely already switched to one of the forks and rewrites that popped up over the years. For those that don't care, nothing will change. The code still works and the library is still normally installable from PyPi. Debian package continues to exist. The purpose of this post is more to give some closure and to sum up a few mail threads that started back in 2015 and never reached a conclusion.

Screenshot of the publicsuffix package page on PyPi.

I've first released the publicsuffix library back in 2011, two employers and a life ago. Back then there was no easily accessible Python implementation of Mozilla's Public Suffix List. Since I needed one for my work, I've picked up a source file from an abandoned open source project on Google Code (which was just being abandoned by Google around that time). I did some minor work on it to make it usable as a standalone library and published it on PyPi.

I've not used publicsuffix myself for years. Looking back, most of my open source projects that I still maintain seem to be like that. Even though I don't use them, I feel some obligation to do basic maintenance on them and answer support mail. If not for other reasons, then out of a sense that I should give back to the body of free software that I depend so much on in my professional career. Some technical problems are also simply fun to work on and most of the time there's not much pressure.

However one thing that was a source of long discussions about publicsuffix is the way the PSL data is distributed. I've written previously about the issue. In summary, you either distribute stale data with the code or fetch an up-to-date copy via the network, which is a privacy problem. These two are the only options possible and going with one or the other or both was always going to be a problem for someone. I hate software that phones home (well, phones Mozilla in this case) as much as anyone, but it's a problem that me as a mere maintainer of a Python library had no hope of solving, even if I got CC'd in all the threads discussing it.

The Public Suffix List is a funny thing. Ideally, software either should not care about the semantic meaning of domain names or this meaning should be embedded in the basic infrastructure of the Internet (e.g. DNS or something). But alas we don't live in either of those worlds and hence we have a magic text file that lives on a HTTP server somewhere and some software needs to have access to it if it wants to do its thing. No amount of worrying on my part was going to change that.

Screenshot of publicsuffix forks on GitHub.

The other issue that sparked at least one fork of publicsuffix was the fact that I refused to publish the source on GitHub. Even tough there are usually several copies of the publicsuffix code on the GitHub at any time, none of them are mine. I was instead hosting my own git repo and was accepting bug reports and other comments only over email.

Some time ago already GitHub became synonymous with open source. People simply expect a PyPi package to have a GitHub (or GitLab, or BitBucket) point-and-click interface somewhere on the web. The practical problem I have with that is that it hugely increases the amount of effort I have to spend on a project (subjectively speaking - keep in mind this is something I do in my free time). Yes, it makes it trivial for someone to contribute a patch. However in practice I find that it does not result in greater quantity of meaningful patches or bug reports. What it does do is create more work for me dealing with low-effort contributions I must reject.

I'm talking about a daunting asymmetry in communication. Writing two sentences in a hurry in a GitHub issue or pushing a bunch of untested code my way in a pull request can take all of a minute for the submitter. On the other hand, I don't want to discourage people from contributing to free software and I know how frustrating it can be to contribute to open source projects (see my post about drive by contributions). So I try to take some time to study the pull request and write an intelligible and useful answer. However this is simply not sustainable. Looking back I also seem to often fail at not letting my frustration show through in my answer. Hence I feel like requiring contributors to at least know how to use git format-patch and write an email forms a useful barrier to entry. It prevents frustration at both ends and I believe that for a well thought-out contribution, the overhead of opening a mail client should be negligible.

Of course, if the project is not officially present on GitHub you get the current situation, where multiple public copies of the project still exist on GitHub, made by random people for their own use. These copies often keep in my contact details and don't obviously state that the code has been modified and/or is not related to the PyPi releases. This causes confusion, since code on GitHub is not the same as the one on PyPi. People also sometimes reuse version numbers for their own private use that conflict with version numbers on PyPi and so on and so on. It is kind of a damned if you do and damned if you don't situation really.


How can I sum this up? I've maintained this software for around 8 years, well after I left the company for which it was originally developed. During that time people have forked and rewrote it for various, largely non-technical reasons. That's fine. It's how free software is supposed to work and my own package was based on another one that got abandoned. I might still be happy to work on technical issues, but the part that turned out much more exhausting than working on the code was dealing with the social and ideological issues people had with it. It's probably my failing that I've spent so much thought on those. In the end, my own interests have changed as well during that time and finally letting it go does also feel like a stone off my shoulders.

Posted by Tomaž | Categories: Code

Add a new comment


(No HTML tags allowed. Separate paragraphs with a blank line.)