Iterators and Generators

The Tools of Pythonicity

What goes on in those for loops?

A note about Python History

Python 2

Python used to be all about sequences – a good chunk of anything you did was stored in a sequence, or involved manipulating a sequence.

  • lists

  • tuples

  • strings

  • dict.keys()

  • dict.values()

  • dict.items()

  • zip()

In python2 – those are all sequences. (in the case of zip and dict methods, they return actual lists)

But it turns out that the most common operation for sequences is to iterate through them:

for item in a_sequence:
    do_something_with_item

So fairly early in Python2, Python introduced the idea of the “iterable”.

More or less, an “iterable” is something you can, well, iterate over in a for loop, but often does not keep the whole sequence in memory at once.

After all – why make a copy of something just to look at all its items?

Example:

In python2: dict.keys() returns a list of all the keys in the dict. But why make a full copy of all the keys, when all you want to do is:

for k in dict.keys():
    do_something_with(k)

Even worse: dict.items() created a full list of (key, value) tuples. – a complete copy of all the data in the dict.

Even worse still: enumerate(dict.items()) created a whole list of (index, (key, value)) tuples – lots of copies of everything.

Enter iter*

Python2 then introduced “iterable” versions of a number of functions and methods:

itertools.izip dict.iteritems() dict.iterkeys() dict.itervalues()

So you could now iterate through that stuff without copying anything. Nice performance benefits, but a somewhat ugly interface.

Python3

Python3 embraces iterables – now everything that could be an iterable without making a copy is done that way – no unnecessary copies.

If you DO need an actual sequence (becasue you want to do something with it other than iterate over it), you have to make a list out of them explicitly:

list(dict.keys())

Then there is an entire module: itertools that provides nifty ways to iterate through stuff.

That will be covered elsewhere.

So while I used to say that python was “all about sequences”, I know say:

“Python is all about iterables

Iterators and Iterables

Iteration is one of the main reasons Python code is so readable:

for x in just_about_anything:
    do_stuff(x)

An “iterable” is anything that can be looped over sequentially, so it does not have to be a “sequence”: list, tuple, etc. For example, a string is iterable. So is a set.

An iterator is an iterable that remembers state. All sequences are iterable, but not all sequences are iterators. To make a sequence an iterator, you can call it with iter:

my_iter = iter(my_sequence)

Iterator Types:

https://docs.python.org/3/library/stdtypes.html#iterator-types

Iterables

To make an object iterable, you simply have to implement the __getitem__ method.

class T:
    def __getitem__(self, position):
        if position > 5:
            raise IndexError
        return position

iter()

How do you get the iterator object from an “iterable”?

The iter function will make any iterable an iterator. It first looks for the __iter__ method, and if none is found, uses __getitem__ to create the iterator.

The iter() function:

In [20]: iter([2,3,4])
Out[20]: <listiterator at 0x101e01350>

In [21]: iter("a string")
Out[21]: <iterator at 0x101e01090>

In [22]: iter( ('a', 'tuple') )
Out[22]: <tupleiterator at 0x101e01710>

List as an Iterator:

In [10]: a_list = [1,2,3]

In [11]: list_iter = iter(a_list)

In [12]: next(list_iter)
Out[12]: 1

In [13]: next(list_iter)
Out[13]: 2

In [14]: next(list_iter)
Out[14]: 3

In [15]: next(list_iter)
--------------------------------------------------
StopIteration     Traceback (most recent call last)
<ipython-input-15-1a7db9b70878> in <module>()
----> 1 next(list_iter)
StopIteration:

Using iterators when you can

consider the example from the trigrams problem:

(http://codekata.com/kata/kata14-tom-swift-under-the-milkwood/)

You have a list of words: words

And you want to go through it, three at a time, and match up pairs with the following word.

The non-pythonic way to do that is a loop through the indices:

for i in range(len(words)-2):
   triple = words[i:i+3]

It works, and is fairly efficient, but what about:

for triple in zip(words[:-2], words[1:-1], words[2:]):

zip() returns an iterable – it does not build up the whole list. So this is quite efficient.

but we are still slicing: ([1:]), which produces a copy – so we are creating three copies of the list – not so good if memory is tight. Note that they are shallow copies, so not that bad.

Nevertheless, we can do better:

The itertools module has an islice() (iterable slice) function. It returns an iterator over a slice of a sequence – so no more copies:

from itertools import islice

In [68]: triplets = zip(words, islice(words, 1, None), islice(words, 2, None))

In [69]: for triplet in triplets:
    ...:     print(triplet)
    ...:
('this', 'that', 'the')
('that', 'the', 'other')
('the', 'other', 'and')
('other', 'and', 'one')
('and', 'one', 'more')

The Iterator Protocol

The main thing that differentiates an iterator from an iterable (sequence) is that an iterator saves state.

An iterable must have the following methods:

an_iterator.__iter__()

Usually returns the iterator object itself.

an_iterator.__next__()

Returns the next item from the container. If there are no further items, raises the StopIteration exception.

An iterable, on the other hand, must have a __iter__ method that returns an initialized iterator (which may or may not be itself). It doesnot have a __next__ method. So you oculd say that the difference between and iterator and a iterable, is htat iterables do not have __next__ methods: you cannont call next() on them.

Making an Iterator

A simple version of range()

class IterateMe_1:
    def __init__(self, stop=5):
        self.current = 0
        self.stop = stop
    def __iter__(self):
        return self
    def __next__(self):
        if self.current < self.stop:
            self.current += 1
            return self.current
        else:
            raise StopIteration

What does for do?

Now that we know the iterator protocol, we can write something like a for loop:

my_for.py

def my_for(an_iterable, func):
    """
    Emulation of a for loop.

    func() will be called with each item in an_iterable
    """
    # equiv of "for i in l:"
    iterator = iter(an_iterable)
    while True:
        try:
            i = next(iterator)
        except StopIteration:
            break
        func(i)

Itertools

itertools is a collection of utilities that make it easy to build an iterator that iterates over sequences in various common ways

http://docs.python.org/3/library/itertools.html

https://pymotw.com/3/itertools/index.html

NOTE:

iteratables are not only for for

They can be used with anything that expects an iterable:

sum, tuple, sorted, list, …

Is an iterator a type?

Iterators are not a type. An “iterable” is anything that has an __iter__ method that returns an iterator and/or has a __getitem__ method that takes 0-based indexes.

An “iterator” is anything that conforms to the “iterator protocol”:

  • Has a __next__() method that returns objects.

  • Raises StopIteration when their are no more objects to be returned.

  • Has a __iter__() method that returns an iterator – usually itself. - sometimes the __iter__() method re-sets the iteration…

https://docs.python.org/3/glossary.html#term-iterator

Lots of common iterators are different types:

In [23]: type(iter(range(5)))
Out[23]: range_iterator

In [24]: iter(list())
Out[24]: <list_iterator at 0x104437fd0>

In [27]: type(iter(zip([],[])))
Out[27]: zip

Here’s a nice overview:

http://treyhunner.com/2016/12/python-iterator-protocol-how-for-loops-work/

LAB

iterator_1.py

  • Extend (iterator_1.py ) to be more like range() – add three input parameters: iterator_2(start, stop, step=1)

  • What happens if you break from a loop and try to pick it up again:

it = IterateMe_2(2, 20, 2)
for i in it:
    if i > 10:  break
    print(i)
for i in it:
    print(i)
  • Does range() behave the same?

    • make yours match range()

    • is range an iterator or an iteratable?

Generators

Generators

  • give you an iterator object

  • no access to the underlying data … if it even exists

Conceptually:

Iterators are about various ways to loop over data.

Generators can generate the data on the fly.

Practically:

You can use either one either way (and a generator is one type of iterator).

Generators do some of the book-keeping for you – simpler syntax.

Generators also can be used for times you want to pause a function and pick it back up later where you left off.

yield

yield is a way to make a quickie generator with a function:

def a_generator_function(params):
    some_stuff
    yield something

Generator functions “yield” a value, rather than returning a value.

It does ‘return’ a value, but rather than ending execution of the function – it preserves the state so it can pick up where it left off.

State is preserved in between yields.

A function with yield in it is a “factory” for a generator

Each time you call it, you get a new generator:

gen_a = a_generator()
gen_b = a_generator()

Each instance keeps its own state.

Really just a shorthand for an iterator class that does the book keeping for you.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object. The actual code in the function is run when next() is called on the generator itself.

And note that each time you call the “generator function” you get a new instance of a generator object that saves state separately from other instances.

An example: like range()

def y_range(start, stop, step=1):
    i = start
    while i < stop:
        yield i
        i += step

Real World Example from FloatCanvas:

https://github.com/svn2github/wxPython/blob/master/3rdParty/FloatCanvas/floatcanvas/FloatCanvas.py#L100

Note:

In [164]: gen = y_range(2,6)
In [165]: type(gen)
Out[165]: generator
In [166]: dir(gen)
Out[166]:
...
 '__iter__',
...
 '__next__',

So the generator is an iterator

Note: A generator function can also be a method in a class

In fact, this is a nice way to provide different ways to iterate over the data in a class in multiple ways.

This is done by the dict protocol with dict.keys() and dict.values().

More about iterators and generators:

Chapter 14 in Fluent Python by Luciano Ramalho

http://www.learningpython.com/2009/02/23/iterators-iterables-and-generators-oh-my/

yield_example.py

generator comprehensions

yet another way to make a generator:

>>> [x * 2 for x in [1, 2, 3]]
[2, 4, 6]
>>> (x * 2 for x in [1, 2, 3])
<generator object <genexpr> at 0x10911bf50>
>>> for n in (x * 2 for x in [1, 2, 3]):
...   print n
... 2 4 6

More interesting if [1, 2, 3] is also a generator

Note that map and filter produce iterators.

Keep in mind – if all you need to do with the results is loop over it – use a generator expression rather than a list comprehension.

Other uses for yield

The yield keyword and generator functions were designed with classic “generators” in mind.

That is – objects that generate values on the fly.

But, as we alluded to earlier, yield can be used for other things as well.

Anytime you want to return a value, and then hold state until later, yield can be used.

Example: pytest fixtures:

@pytest.fixture
def example_fixture(request):
    # setup code here
    value = something()
    yield value  # provide the fixture value
    # do the teardown
    something_with(value)

In this case, the yield isn’t in any sort of loop or anything. It will only get run once. But the generator will maintain state, so the value can be used after the yield to do the teardown.

How would this be done without yield? You’d need to store the value in a class:

class a_fixture():

    def __call__(self):
        # make it callable so it can provide the value
        # setup code here
        value = something()
        self.value = value
        return value

    def teardown(self):
        something_with(self.value)

Not horrible, but not as clean and simple.

LAB

Write a few generators:

  • Sum of integers

  • Doubler

  • Fibonacci sequence

  • Prime numbers

Test code in:

test_generator.py

Descriptions:

Sum of the integers:

keep adding the next integer

0 + 1 + 2 + 3 + 4 + 5 + …

so the sequence is:

0, 1, 3, 6, 10, 15 …..

Doubler:

Each value is double the previous value:

1, 2, 4, 8, 16, 32,

Fibonacci sequence:

The fibonacci sequence as a generator:

f(n) = f(n-1) + f(n-2)

1, 1, 2, 3, 5, 8, 13, 21, 34…

Prime numbers:

Generate the prime numbers (numbers only divisible by them self and 1):

2, 3, 5, 7, 11, 13, 17, 19, 23…

Others to try:

Try x^2, x^3, counting by threes, x^e, counting by minus seven, …