Thursday 6 March 2014

Python List Comprehensions and Generators for C# Developers

If you’re a C# programmer and you’ve used LINQ, you’ll know how powerful it is to allow you to manipulate sequences of data in all kinds of interesting ways, without needing to write for loops. Python has similar capabilities, using what are called “list comprehensions” and “generators”. In this post, I’ll demonstrate how they work, showing them side by side with roughly equivalent C# code.

List Comprehensions

A list comprehension in Python allows you to create a new list from an existing list (or as we shall see later, from any “iterable”).

Let’s start with a simple example at the Python REPL. Here we create a list, that contains the square of each number returned by the range function (which in this case returns 0,1,2,…9)

>>> [x*x for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

This is equivalent to a C# LINQ statement that takes a range (using Enumerable.Range), selects the square (using Select), and then turns the whole thing into a list (using ToList):

Enumerable.Range(0, 10).Select(x => x*x).ToList();

Python list comprehensions also allow you to filter as you go, by inserting an “if” clause. Here, we’ll only take the squares of odd numbers:

>>> [x*x for x in range(10) if x%2]
[1, 9, 25, 49, 81]

This is equivalent to chaining a Where clause into our LINQ statement:

Enumerable.Range(0, 10).Where(x => x%2 != 0)
    .Select(x => x*x).ToList();

You can actually have two “for” clauses inside your list comprehension, so you could create some coordinates as a tuple like this:

>>> coords = [(x,y) for x in range(4) for y in range(4)]
[(0, 0), (0, 1), (0, 2), (0, 3), 
 (1, 0), (1, 1), (1, 2), (1, 3), 
 (2, 0), (2, 1), (2, 2), (2, 3), 
 (3, 0), (3, 1), (3, 2), (3, 3)]

The same effect can be achieved using the SelectMany clause in LINQ:

Enumerable.Range(0,4).SelectMany(x => Enumerable.Range(0,4)
    .Select(y => new Tuple<int,int>(x,y))).ToList();

You can see that the LINQ gets a little cumbersome at this point, although you can use the alternative syntax:

from x in Enumerable.Range(0,4)
from y in Enumerable.Range(0,4)
select new Tuple<int,int>(x,y)

Here's another Python list comprehension with two for expressions, making a list of all the spaces on a chessboard

>>> [x + str(y+1) for x in "ABCDEFGH" for y in range(8)]
['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 
 'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8',
 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 
 'D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7', 'D8', 
 'E1', 'E2', 'E3', 'E4', 'E5', 'E6', 'E7', 'E8', 
 'F1', 'F2', 'F3', 'F4', 'F5', 'F6', 'F7', 'F8', 
 'G1', 'G2', 'G3', 'G4', 'G5', 'G6', 'G7', 'G8', 
 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'H7', 'H8']

And in C#, you'd do something like:

"ABCDEFGH".SelectMany(x => Enumerable.Range(1,8)
    .Select(y => x+y.ToString())).ToList()

Dictionaries and Sets

You don't actually have to create lists. Python lets you use a similar syntax to create a set (no duplicate elements), or a dictionary. Here we'll start with a list of fruit, then use a list comprehension to make a list of string lengths. Then we'll make a set of unique fruit lengths, and we'll finally make a dictionary keyed on fruit name, and containing the length as a value:

>>> fruit = [‘apples’,’oranges’,’bananas’,’pears’]
>>> [len(f) for f in fruit]
[6, 7, 7, 5]
>>> {len(f) for f in fruit}
set([5, 6, 7])
>>> {f:len(f) for f in fruit}
{‘bananas’:7, ‘oranges’:7, ‘pears’:5, ‘apples’:6} 

We can create the set of unique lengths in C# by creating a HashSet, passing in our LINQ statement to its constructor. And you can use LINQ's ToDictionary extension method to make the equivalent dictionary of strings to lengths:

var fruit = new [] { "apples", "oranges", "bananas", "pears" };
fruit.Select(f => f.Length).ToList();
new HashSet<int>(fruit.Select(f => f.Length));
fruit.ToDictionary(f => f, f => f.Length);

Generators

Python generators are essentially the same concept as a C# method that returns an IEnumerable<T>. In fact, the syntax for creating them is very similar – you just need to use the yield keyword. Here’s a generator function that returns the names of my children:

def generateChildren():
    yield "Ben"
    yield "Lily"
    yield "Joel"
    yield "Sam"
    yield "Annie"

And here’s the same thing in C#:

public IEnumerable<string> GenerateChildren() 
{
    yield return "Ben";
    yield return "Lily";
    yield return "Joel";
    yield return "Sam";
    yield return "Annie";
}

Like with C#, Python generators uses lazy evaluation. This means that they could return infinite sequences. And it also means that it is not until we actually evaluate them that we will get any errors. This code example:

def generateNumbers():
    yield 2/2
    yield 3/1
    yield 4/0 # will cause a ZeroDivisionError
    yield 5/-1

numbersGenerator = generateNumbers()
print("Numbers Generator", numbersGenerator)
try:
    numbers = [n for n in numbersGenerator]
    print("Numbers", numbers)
except ZeroDivisionError:
    print("oops")

Generates the following output:

Numbers Generator <generator object 
    generateNumbers at 0x0000000002ADD4C8>
oops

Python provides a method called “next” that allows you to step through the outputs from a generator one by one. Let’s try that with our children generator function:

>>> children = generateChildren()
>>> next(children)
'Ben'
>>> next(children)
'Lily'
>>> next(children)
'Joel'
>>> next(children)
'Sam'
>>> next(children)
'Annie'
>>> next(children)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

You’ll notice that calling next after we have reached the end gives us a StopIteration exception. C#’s closest equivalent to the Python next function is getting the enumerator and stepping through with MoveNext:

var children = GenerateChildren().GetEnumerator();
children.MoveNext();
Console.WriteLine(children.Current);
children.MoveNext();
Console.WriteLine(children.Current);
children.MoveNext();
Console.WriteLine(children.Current);    
children.MoveNext();
Console.WriteLine(children.Current);    
children.MoveNext();
Console.WriteLine(children.Current);
children.MoveNext();
Console.WriteLine(children.Current);

This produces the following output (the last item is repeated because we didn’t check the return code of MoveNext which indicates whether we reached the end of the enumeration).

Ben
Lily
Joel
Sam
Annie
Annie

In practice in C# it is fairly rare to use the enumerator directly. When you have an IEnumerable<T> you typically use it in a foreach loop or with some of the LINQ extension methods.

The Python list comprehension syntax also allows us to create new generators from existing generators. For example:

>>> (x*x for x in range(10))
<generator object <genexpr> at 0x0000000002ADD750>

This allows you to compose complex generators out of simple statements, creating a pipeline very much like you can with chained LINQ extension methods.

Conclusion

As you can see, Python list comprehensions and generators provide the same power that you are used to with C# and LINQ, and with a syntax that is more compact in most cases. Look out for a follow-up post shortly where I will demonstrate how many of the standard LINQ extension methods such as Any, All, Max, Min, Take, Skip, TakeWhile, GroupBy, First, FirstOrDefault, and OrderBy can be achieved in Python.

2 comments:

Dan said...

One thing missing from .Net that Python has is a generator's send method. It allows you to write values back into the method at the point you yielded. It's a bit less useful in statically typed languages since you have to decide between being able to send in one type or losing out on some compile time checking.

Unknown said...

@Dan, cool, I haven't used that before. I'll have a play with it