Here’s an oddity of Python that I stumbled upon, which has turned me off doing anything even remotely complicated with list comprehensions. For those following along at home, fire up your ipython now.

In [1]: array = [[1,2,3], [4,5,6], [7,8,9]]
In [2]: [ sum(col) for col in array ]
Out[2]: [6, 15, 24]

So far so good. Now suppose I want to take the squares of all the elements in array, and flatten the results into a single list. This looks to me like a natural way to do it:

In [3]: [ x*x for x in col for col in array ]

However, it is wrong. The result:

Out [3]: [49, 49, 49, 64, 64, 64, 81, 81, 81]

At this point I should insert a picture of a horse in a raincoat holding a cat or something similar.

Let’s try that one more time, for sanity’s sake:

In [4]: [ x*x for x in y for y in array ]
NameError: name 'y' is not defined

Wat?!

What’s going on is two distinct oddities of Python, which I have encountered in production code and debugged so that you don’t have to.

Firstly, loop variables leak into the surrounding scope.1 It’s easy to forget this for list comprehensions, since they don’t overtly contain a loop, but they work just the same. In our example, the col variable that for x in col is using is actually the exit value of col in the previous list comprehension. If you were following along at home and didn’t do that first, trivial, comprehension (with sum(col)) then chances are you didn’t have col defined in your ipython session and you hit the error early.

Once we’ve got that far, it’s easy to see where the example went wrong: variables are introduced in a list comprehension left-to-right, not right-to-left, so for row in array for x in row is the correct order. We can get here also by mentally creating the nested lists the comprehension represents:

for row in array:
    for x in row:
        yield x*x

A comprehension lifts the yield line to the front but leaves the loops in the same order.

Still, I think my ordering is a more natural way to represent the expression I’m trying to capture in code. I had assumed that it came from my long-ago love affair with Haskell, but that’s actually not the case. The equivalent Haskell expression is:

let array = [[1,2,3], [4,5,6], [7,8,9]] in [ x*x | row <- array, x <- row ]

The order the variables appear in is actually just the same.

This doesn’t raise my hackles the way the Python version does, though. The Haskell code clearly distinguishes the pipe from the comma; the rule you have to know is, before the pipe sees (variables defined) after the pipe, and scope resolution for commas works left-to-right as usual.2 The weird thing about the Python version is that for <something> in <somewhere> has different scope resolution rules depending on where it is: the first instance in a comprehension resolves the <something> in <somewhere>, but later ones cannot.

So there you go. A case where Python’s minimalist syntax, which usually I very much appreciate, seems to me to be taking things a bit too far. And (again, just my opinion) a good reason to avoid this kind of list comprehension construction completely. Although in this case the alternative, a wordy maplambda and an esoteric sum, doesn’t make me any happier:

In [5]: map(lambda x: x*x, sum(array, []))
Out[5]: [1, 4, 9, 16, 25, 36, 49, 64, 81]

Notes:

  1. All Python programmers should know this about the for loop, since it lets you query the final value of the loop’s variable after exiting the loop. []
  2. And the same rules apply elsewhere in Haskell, for instance when pipe is used for guard expressions. []