The last couple of weekends I’ve been working on a little Python side project:
dripfeed. This is partly a tool to help me keep my webcomics addiction under control, and partly an excuse to try a bunch of Python tools that I haven’t played with yet.
The official excuse (“it scratches an itch”)
Whenever I discover a new webcomic, I’m faced with the problem of getting myself up to speed: reading the archives in order, so that I’m ready to start reading along at the rate the author is writing. Typically I solve this problem with a single binge session, reading as far into the night (and morning) as it takes. This tactic is only marginally compatible with the realities of parenthood, however.
When I discovered The Abominable Charles Christopher I tried something different: I rationed my catchup sessions, and bookmarked my progress. Of course this led to the predictable frustrations: some nights I forgot to bookmark, some days I wanted to have a quick catchup session somewhere my bookmarks aren’t synched, and I ended up with whole lot of “single-shot” bookmarks that needed tidying up.
What I really wanted was to do my catchup in the same place I read my regular webcomics: my RSS feed reader. And thus was
The real reason (“it scratches a different itch”)
I don’t actually have a new webcomic to catch up with at the moment, but I do have a bunch of Python tools and techniques that I’ve wanted to play with but that we don’t need at work. Putting this package together let me try out:
cookiecutter(a templating system for initialising new projects)
docopt(commandline arg parsing, configured by writing the
- Travis CI testing (on a mercurial project hosted on bitbucket!)
- supporting both Python 2 and Python 3 from the same codebase
Here are some lessons I learned on the way:
Cookiecutter is cool
I used someone else’s template, and it wasn’t set up quite how I wanted; I probably could have set up my repository myself in about the same time I spent rearranging theirs, but this way I got reminded about the bits I might have forgotten, and I also got introduced to some new utilities. Next time, though, I’m writing a template myself.
Docopt is less cool in practise than it sounds in theory
In principle the idea behind
docopt is great: just write your
--help text output (including usage examples) and it will produce a commandline parser to match.
In practise, though, I found that it wasn’t quite flexible enough in the places I needed it to be. The
dripfeed interface has a couple of commands (
info) which take arguments and options, and there are a few global options for controlling logging and verbosity. The suggested way to handle this structure with
docopt is to make separate sub-parsers for each command but this seemed overkill for the very small command set
dripfeed has. I hacked together an alternative, but it felt like I was working against the grain of the library in doing so, and the result isn’t as friendly as I would like.
There’s a second way in which I started falling out of love with
docopt in this project: it’s not composable. I added a set of options for controlling logging:
--log for specifying an output file.2 Ideally I would like to extract the handling for these into a package which I can drop into any of my commandline scripts, so that I don’t have to repeat myself whenever I want these (very standard-looking) options. But
docopt makes that difficult: the spec for the options has to be intermingled with the spec for the rest of the commandline interface. Next side project I play with I’ll be looking at
click instead, for this reason.
I need to either learn Git, or raise my Hg game
The commit history for this project is an embarassment. I’ve always been a fan of the mercurial philosophy that “history is sacred”;3 looking at some of these commits, though, I start to appreciate git’s philosophy that “history is sacred”.4 Git makes it easy to commit often for trivial changes (which is good for short-term development) and still to squash those commits into reasonable-sized coherent updates before pushing them out into the world (which is good for long-term not looking like a doofus). I understand that modern mercurial can do this too, but it’s certainly not the way I use it at the moment.
Contributing to this feeling is the fact that Travis CI is so self-evidently the obvious choice for CI testing… but it doesn’t support bitbucket, which is equally self-evidently the obvious place to host mercurial projects. I’ve “solved” this problem this time around by making a github clone of the repository, which I update with the
hg-git mercurial extension, but the setup is decidedly rickety.
Supporting both Python 2 and Python 3 is … interesting
What’s much more interesting is dealing with the changes in string types and semantics. Being forced to be explicit about the distinction between bytes and unicode strings for Python 3 actually helped me catch what I think would have been a very subtle bug in the original code — one that it’s very unlikely would ever have been discovered5 but still I think this counts as a point for the New Deal.6
Was it worth it?
- In fact, from any sequence of html pages all of which have a “next” link that can be reliably extracted with an XPath expression. [↪]
- Yes, this is total overkill for this tiny little script. I said I was playing with tools, right? [↪]
- It should record everything that happened in the project. [↪]
- It should be a readable summary of significant changes. [↪]
- I don’t expect this tool to get heavy use. [↪]
- The bug was: I was loading RSS files with
open('r+')instead of with
open('r+b'). I think this could have caused problems with a non-unicode-encoded RSS feed, but I’m not 100% sure. [↪]
- I am feeling the pain of not scraping the images into the feed, but that’s a deliberate decision that I don’t see changing. [↪]