Iterables and Generators

The Tools of Pythonicity

What goes on in those for loops?

A note about Python History

Python used to be all about sequences – a good chunk of anything you did was stored in a sequence, oir involved manipulating a sequence.

  • lists
  • tuples
  • strings
  • dict.keys()
  • dict.values()
  • dict.items()
  • zip()
  • ...

In python2 – those are all sequences.

But it turns out that the most common operation for sequences is to iterate through them:

for item in a_sequence:
    do_something_with_item

So fairly early in Python2 Python introduced the idea of the “iterable”.

More or less, an “iterable” is something you can, well, iterate over in a for loop, but that does not necessarily keep the whole sequence in memory at once.

After all – why make a whiel copy of somethign just ot look at all its items?

Example:

in python2: dict.keys() returns a list of all the keys in the dict. But why make a full copy of all teh keys, when all you want o do is:

for k in dict.keys():
    something_with(k)

Even worse: dict.items() created a full list of (key,value) tuples. – a complete copy of all the data in the dict.

Even worse: enumerate(dict.items()) created a whole list of (index, (key, value)) tuples – lots of copies of everything.

Enter iter*

Python2 then introduced “iterable” versions of a number of functions and methods:

itertools.izip dict.iteritems() dict.iterkeys() dict.itervalues()

So you could now iterate through that stuff without copying anything.

Python3

Python3 embraces iterables – now everything that could be an iterator is already an iterator – no unnecessary copies.

You have to make a list out of them explicitly if you really want it:

list(dict.keys())

Then there is an entire module: itertools that provides nifty ways to iterate through stuff.

This can be a real boon to performance. consider the example from the trigrams problem:

(http://codekata.com/kata/kata14-tom-swift-under-the-milkwood/)

You have a list of words: words

And you want to go through it, three at a time, and match up pairs with the following word.

The non-pythonic way to do that is a loop through the indices:

for i in range(len(words)-2):
   triple = words[i:i+3]

It works, and is fairly efficient, but what about:

for triple in zip(words[:-2], words[1:-1], words[2:-2]):

Pretty nifty.

And zip() doesn’t make any copies – it generates the tuples on the fly. So pretty efficient...

But what about the slicing? – words[:-2] Slicing makes copies. So in the above code, we are copying the list three times!

Not so good.

Do remember that the copies are “shallow” – the actual strings are not copied – but there is a copied pointer for each item in the list. As these are big lists of small strings – it is significant.

Enter itertools

The itertools module has a islice() (iterable slice) function. It returns an iterator over a slice of a sequence – so no more copies:

from itertools import islice

for triple in zip(words, islice(words, 1, None), islice(words, 2, None)):

A bit klunky – but no extra copies.

So while I used to say that python was all about sequences – it is now all about iterables.

Iterators and Iterables

Iteration is one of the main reasons Python code is so readable:

for x in just_about_anything:
    do_stuff(x)

An iterable is anything that can be looped over sequentially, so it does not have to be a “sequence”: list, tuple, etc. For example, a string is iterable. So is a set.

An iterator is an iterable that remembers state. All sequences are iterable, but not all sequences are iterators. To make a sequence an iterator, you can call it with iter:

my_iter = iter(my_sequence)

Iterator Types:

https://docs.python.org/3/library/stdtypes.html#iterator-types

Iterables

To make an object iterable, you simply have to implement the __getitem__ method.

Python will take care of the rest.

class T:
    def __getitem__(self, position):
    if position > 5:
        raise IndexError
    return position

Demo

iter()

How do you get the iterator object from an “iterable”?

The iter function will make any iterable an iterator. It first looks for the __iter__ method, and if none is found, uses get_item to create the iterator.

The iter() function:

In [20]: iter([2,3,4])
Out[20]: <listiterator at 0x101e01350>

In [21]: iter("a string")
Out[21]: <iterator at 0x101e01090>

In [22]: iter( ('a', 'tuple') )
Out[22]: <tupleiterator at 0x101e01710>

List as an Iterator:

In [10]: a_list = [1,2,3]

In [11]: list_iter = iter(a_list)

In [12]: next(list_iter)
Out[12]: 1

In [13]: next(list_iter)
Out[13]: 2

In [14]: next(list_iter)
Out[14]: 3

In [15]: next(list_iter)
--------------------------------------------------
StopIteration     Traceback (most recent call last)
<ipython-input-15-1a7db9b70878> in <module>()
----> 1 next(list_iter)
StopIteration:

Using iterators when you can

Example: trigrams:

triplets = zip(words, words[1:], words[2:])

zip() returns an iterable – it does not build up the whole list. So this is quite efficient.

but slicing: ([1:]) produces a copy – so this does use three copies of the list – not so good if memory is tight. Note that they are shallow copies, so not that bad.

Nevertheless, we can do better:

from itertools import islice

In [68]: triplets = zip(words, islice(words, 1, None), islice(words, 2, None))

In [69]: for triplet in triplets:
    ...:     print(triplet)
    ...:
('this', 'that', 'the')
('that', 'the', 'other')
('the', 'other', 'and')
('other', 'and', 'one')
('and', 'one', 'more')

The Iterator Protocol

The main thing that differentiates an iterator from an iterable (sequence) is that an iterator saves state.

An iterable must have the following methods:

an_iterator.__iter__()

Usually returns the iterator object itself.

an_iterator.__next__()

Returns the next item from the container. If there are no further items, raises the StopIteration exception.

Making an Iterator

A simple version of range()

class IterateMe_1:
    def __init__(self, stop=5):
        self.current = 0
        self.stop = stop
    def __iter__(self):
        return self
    def __next__(self):
        if self.current < self.stop:
            self.current += 1
            return self.current
        else:
            raise StopIteration

Demo: Examples/iterators/iterator_1.py

What does for do?

Now that we know the iterator protocol, we can write something like a for loop:

# equiv of "for i in l:"
iterator = iter(an_iterable)
while True:
    try:
        item = next(iterator)
    except StopIteration:
        break
    do_something_with_item(item)

Itertools

itertools is a collection of utilities that make it easy to build an iterator that iterates over sequences in various common ways

http://docs.python.org/3/library/itertools.html

NOTE:

iteratables are not only for for

They can be used with anything that expects an iterable:

  • sum, map, list comprehensions, ...
  • constructors for sequences: - tuple, sorted, list, ...

Is an iterator a type?

Iterators are not a type. An “iteratorable” is anything that has an __iter__ method that returns an iterator.

An “iterator” is anything that conforms to the “iterator protocol”:

  • Has a __next__() method that returns objects.
  • Raises StopIteration when their are no more objects to be returned.
  • Has a __iter__() method that returns an iterator – usually itself. - sometimes the __iter__() method re-sets the iteration...

https://docs.python.org/3/glossary.html#term-iterator

Lots of common iterators are different types:

In [23]: type(iter(range(5)))
Out[23]: range_iterator

In [24]: iter(list())
Out[24]: <list_iterator at 0x104437fd0>

In [27]: type(iter(zip([],[])))
Out[27]: zip

Here’s a nice overview:

http://treyhunner.com/2016/12/python-iterator-protocol-how-for-loops-work/

LAB

In the Examples/iterators dir, you will find: iterator_1.py

  • Extend (iterator_1.py ) to be more like range() – add three input parameters: iterator_2(start, stop, step=1)
  • What happens if you break from a loop and try to pick it up again:
it = IterateMe_2(2, 20, 2)
for i in it:
    if i > 10:  break
    print(i)
for i in it:
    print(i)
  • Does range() behave the same?
    • make yours match range()
    • is range an iterator or an iteratable?

Generators

Generators

  • give you an iterator object
  • no access to the underlying data ... if it even exists
Conceptually:

“Iterators” are about various ways to loop over existing data.

“Generators” can generate the data on the fly.

Practically:

You can use either one either way (and a generator is an iterator).

Generators do some of the book-keeping for you – simpler syntax.

Generators also can be used for any time you want to pause function and pick it up where you left off.

yield

The yield keyword is the way to make a generator with a function:

def a_generator_function(params):
    some_stuff
    yield something
    do_some_more_stuff

Generator functions “yield” a value, rather than returning a value.

It is does return a value, but rather than ending execution of the function – it preserves the state so it can pick up where it left off.

State is preserved in between yields.

generator functions

A function with yield in it is a “factory” for a generator

Each time you call it, you get a new generator:

gen_a = a_generator()
gen_b = a_generator()

Each instance keeps its own state.

Really just a shorthand for an iterator class that does the book keeping for you.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object. The actual code in the function is run when next() is called on the generator itself.

And note that each time you call the “generator function” you get a new instance of a generator object that saves state separately from other instances.

An example: like range()

def y_range(start, stop, step=1):
    i = start
    while i < stop:
        yield i
        i += step

Real World Example from FloatCanvas:

https://github.com/wxWidgets/Phoenix/blob/master/wx/lib/floatcanvas/FCObjects.py#L82

Note:

In [164]: gen = y_range(2,6)
In [165]: type(gen)
Out[165]: generator
In [166]: dir(gen)
Out[166]:
...
 '__iter__',
...
 '__next__',

So the generator is an iterator

Note: A generator function can also be a method in a class

In fact, this is a nice way to provide different ways to iterate over the data in a class in multiple ways.

More about iterators and generators:

http://www.learningpython.com/2009/02/23/iterators-iterables-and-generators-oh-my/

and:

https://pythontips.com/2013/09/29/the-python-yield-keyword-explained/

demo:

see: Examples/iterators/yield_example.py

generator comprehensions

Yet another way to make a generator:

>>> [x * 2 for x in [1, 2, 3]]
[2, 4, 6]
>>> (x * 2 for x in [1, 2, 3])
<generator object <genexpr> at 0x10911bf50>
>>> for n in (x * 2 for x in [1, 2, 3]):
...   print n
... 2 4 6

More interesting if [1, 2, 3] is also a generator

Keep in mind – if all you need to do with the results is loop over it – use a generator expression rather than a list comprehension.

Other uses for yield

Note that the yield keyword and generator functions were designed with classic “generators” in mind.

That is – objects that generate values on the fly.

But yield can be used for other things as well.

Anytime you want to return a value, and then hold state until later, yield can be used.

Example: pytest fixtures:

@pytest.fixture
def example_fixture(request):
    # setup code here
    value = something()
    yield value  # provide the fixture value
    # do the teardown
    something_with(value)

In this case, the yield isn’t in any sort of loop or anything. It will only get run once. But the generator will maintain state, so the value can be used after the yield to do the teardown.

How would this be done without yield? You’d need to store the value in a class:

class a_fixture():

    def __call__(self):
        # make it callable so it can provide the value
        # setup code here
        value = something()
        self.value = value
        return value

    def teardown(self):
        something_with(self.value)

Not horrible, but not as clean and simple.

Context managers

But an even bigger advantage is that you can use context managers with yield:

@pytest.fixture
def example_fixture(request):
    # setup code here
    with open("a_test_filename") as test_file:
        yield test_file  # provide the fixture value

And that’s it!

When the fixture is first invoked, it will yield the test_file. It will then save the state, with the file open until next() is called again - time for the teardown.

But there is no more code after the yield – so it falls out of the context manager, and the file is closed.

These kinds of tricks become quite useful for asynchronous programming, etc. More on that in a few weeks.

LAB

Write a few generators:

  • Sum of integers
  • Doubler
  • Fibonacci sequence
  • Prime numbers

Test code in:

Examples/iterators/test_generator.py

Descriptions:

Sum of the integers:

keep adding the next integer

0 + 1 + 2 + 3 + 4 + 5 + ...

so the sequence is:

0, 1, 3, 6, 10, 15 .....

Doubler:

Each value is double the previous value:

1, 2, 4, 8, 16, 32,

Fibonacci sequence:

The fibonacci sequence as a generator:

f(n) = f(n-1) + f(n-2)

1, 1, 2, 3, 5, 8, 13, 21, 34...

Prime numbers:

Generate the prime numbers (numbers only divisible by them self and 1):

2, 3, 5, 7, 11, 13, 17, 19, 23...

Others to try:
Try x^2, x^3, counting by threes, x^e, counting by minus seven, ...