Session Four: Lists, Iteration, Strings, Dictionaries, Sets¶
Announcements¶
Topics for Today¶
- Lists
- Iteration
- Strings
- Dictionaries
- Sets
Driving toward the mailroom.
Lightning Talks¶
Chi Ho
Tom Gaffney
Anybody else ready? I’ll be asking for volunteers at the end of class.
Lists¶
In addition to all the methods supported by sequences we’ve already seen, mutable sequences such as Lists have a number of other methods.
You can find all these in the Standard Library Documentation:
https://docs.python.org/3/library/stdtypes.html#typesseq-mutable
Assignment¶
You’ve already seen changing a single element of a list by assignment.
Pretty much the same as “arrays” in most languages:
In [100]: list = [1, 2, 3]
In [101]: list[2] = 10
In [102]: list
Out[102]: [1, 2, 10]
Growing the List¶
.append()
, .insert()
, .extend()
In [74]: food = ['spam', 'eggs', 'ham']
In [75]: food.append('sushi')
In [76]: food
Out[76]: ['spam', 'eggs', 'ham', 'sushi']
In [77]: food.insert(0, 'beans')
In [78]: food
Out[78]: ['beans', 'spam', 'eggs', 'ham', 'sushi']
In [79]: food.extend(['bread', 'water'])
In [80]: food
Out[80]: ['beans', 'spam', 'eggs', 'ham', 'sushi', 'bread', 'water']
You can pass any sequence to .extend()
:
In [85]: food
Out[85]: ['beans', 'spam', 'eggs', 'ham', 'sushi', 'bread', 'water']
In [86]: food.extend('spaghetti')
In [87]: food
Out[87]:
['beans', 'spam', 'eggs', 'ham', 'sushi', 'bread', 'water',
's', 'p', 'a', 'g', 'h', 'e', 't', 't', 'i']
Shrinking a List¶
.pop()
, .remove()
In [203]: food = ['spam', 'eggs', 'ham', 'toast']
In [204]: food.pop()
Out[204]: 'toast'
In [205]: food.pop(0)
Out[205]: 'spam'
In [206]: food
Out[206]: ['eggs', 'ham']
In [207]: food.remove('ham')
In [208]: food
Out[208]: ['eggs']
You can also delete slices of a list with the del
keyword:
In [92]: nums = range(10)
In [93]: nums
Out[93]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [94]: del nums[1:6:2]
In [95]: nums
Out[95]: [0, 2, 4, 6, 7, 8, 9]
In [96]: del nums[-3:]
In [97]: nums
Out[97]: [0, 2, 4, 6]
Copying Lists¶
You can make copies of part of a list using slicing:
In [227]: food = ['spam', 'eggs', 'ham', 'sushi']
In [228]: some_food = food[1:3]
In [229]: some_food[1] = 'bacon'
In [230]: food
Out[230]: ['spam', 'eggs', 'ham', 'sushi']
In [231]: some_food
Out[231]: ['eggs', 'bacon']
If you provide no arguments to the slice, it makes a copy of the entire list:
In [232]: food
Out[232]: ['spam', 'eggs', 'ham', 'sushi']
In [233]: food2 = food[:]
In [234]: food is food2
Out[234]: False
The copy of a list made this way is a shallow copy.
The list is itself a new object, but the objects it contains are not.
Mutable objects in the list can be mutated in both copies:
In [249]: food = ['spam', ['eggs', 'ham']]
In [251]: food_copy = food[:]
In [252]: food[1].pop()
Out[252]: 'ham'
In [253]: food
Out[253]: ['spam', ['eggs']]
In [256]: food.pop(0)
Out[256]: 'spam'
In [257]: food
Out[257]: [['eggs']]
In [258]: food_copy
Out[258]: ['spam', ['eggs']]
Consider this common pattern:
for x in somelist:
if should_be_removed(x):
somelist.remove(x)
This looks benign enough, but changing a list while you are iterating over it can be the cause of some pernicious bugs.
For example:
In [27]: l = list(range(10))
In [28]: l
Out[28]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [29]: for item in l:
....: l.remove(item)
....:
For example:
In [27]: l = list(range(10))
In [28]: l
Out[28]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [29]: for item in l:
....: l.remove(item)
....:
In [30]: l
Out[30]: [1, 3, 5, 7, 9]
Iterate over a copy, and mutate the original:
In [33]: l = list(range(10))
In [34]: for item in l[:]:
....: l.remove(item)
....:
In [35]: l
Out[35]: []
Okay, so we’ve done this a bunch already, but let’s state it out loud.
You can iterate over a sequence.
for element in sequence:
do_something(element)
which is what we mean when we say a sequence is an “iterable”.
Again, we’ll touch more on this in a short while, but first a few more words about Lists and Tuples.
Miscellaneous List Methods¶
These methods change a list in place and are not available on immutable sequence types.
.reverse()
In [129]: food = ['spam', 'eggs', 'ham']
In [130]: food.reverse()
In [131]: food
Out[131]: ['ham', 'eggs', 'spam']
.sort()
In [132]: food.sort()
In [133]: food
Out[133]: ['eggs', 'ham', 'spam']
Because these methods mutate the list in place, they have a return value of None
.sort()
can take an optional key
parameter.
It should be a function that takes one parameter (list items one at a time) and returns something that can be used for sorting:
In [137]: def third_letter(string):
.....: return string[2]
.....:
In [138]: food.sort(key=third_letter)
In [139]: food
Out[139]: ['spam', 'eggs', 'ham']
List Performance¶
- indexing is fast and constant time: O(1)
x in l
is proportional to n: O(n)- visiting all is proportional to n: O(n)
- operating on the end of list is fast and constant time: O(1)
- append(), pop()
- operating on the front (or middle) of the list depends on n: O(n)
pop(0)
,insert(0, v)
- But, reversing is fast.
Also, collections.deque
Choosing Lists or Tuples¶
Here are a few guidelines on when to choose a list or a tuple:
- If it needs to mutable: list
- If it needs to be immutable: tuple
- (safety when passing to a function)
Otherwise ... taste and convention
Convention¶
Lists are Collections (homogeneous): – contain values of the same type – simplifies iterating, sorting, etc
tuples are mixed types: – Group multiple values into one logical thing – Kind of like simple C structs.
Other Considerations¶
- Do the same operation to each element?
- list
- Small collection of values which make a single logical item?
- tuple
- To document that these values won’t change?
- tuple
- Build it iteratively?
- list
- Transform, filter, etc?
- list
More Documentation¶
For more information, read the list docs:
https://docs.python.org/3.5/library/stdtypes.html#mutable-sequence-types
(actually any mutable sequence....)
Lightning Talk¶
Chi Ho
Iteration¶
Repetition, Repetition, Repetition, Repe...
For Loops¶
We’ve seen simple iteration over a sequence with for ... in
:
In [170]: for x in "a string":
.....: print(x)
.....:
a
s
t
r
i
n
g
Contrast this with other languages, where you must build and use an index
:
for(var i = 0; i < arr.length; i++) {
var value = arr[i];
alert(i + value);
}
If you do need an index, you can use enumerate
:
In [140]: for idx, letter in enumerate('Python'):
.....: print(idx, letter, end=' ')
.....:
0 P 1 y 2 t 3 h 4 o 5 n
range
and for
Loops¶
The range
builtin is useful for looping a known number of times:
In [171]: for i in range(5):
.....: print(i)
.....:
0
1
2
3
4
But you don’t really need to do anything at all with i
In fact, it’s a common convension to make this clear with a “nothing” name:
In [21]: for __ in range(5):
....: print("*")
....:
*
*
*
*
*
Sometimes you want to interrupt or alter the flow of control through a loop.
Loops can be controlled in two ways, with break
and continue
The break
keyword will cause a loop to immediately terminate:
In [141]: for i in range(101):
.....: print(i)
.....: if i > 50:
.....: break
.....:
0 1 2 3 4 5... 46 47 48 49 50 51
The continue
keyword will skip later statements in the loop block, but
allow iteration to continue:
In [143]: for in in range(101):
.....: if i > 50:
.....: break
.....: if i < 25:
.....: continue
.....: print(i, end=' ')
.....:
25 26 27 28 29 ... 41 42 43 44 45 46 47 48 49 50
For loops can also take an optional else
block.
Executed only when the loop exits normally (not via break):
In [147]: for x in range(10):
.....: if x == 11:
.....: break
.....: else:
.....: print('finished')
finished
In [148]: for x in range(10):
.....: if x == 5:
.....: print(x)
.....: break
.....: else:
.....: print('finished')
5
This is a really nice unique Python feature!
While Loops¶
The while
keyword is for when you don’t know how many loops you need.
It continues to execute the body until condition is not True
:
while a_condition:
some_code
in_the_body
while
is more general than for
– you can always express for
as while
, but not always vice-versa.
while
is more error-prone – requires some care to terminate
loop body must make progress, so condition can become False
potential error – infinite loops:
i = 0;
while i < 5:
print(i)
Use break
:
In [150]: while True:
.....: i += 1
.....: if i > 10:
.....: break
.....: print(i)
.....:
1 2 3 4 5 6 7 8 9 10
Set a flag:
In [156]: import random
In [157]: keep_going = True
In [158]: while keep_going:
.....: num = random.choice(range(5))
.....: print(num)
.....: if num == 3:
.....: keep_going = False
.....:
3
Use a condition:
In [161]: while i < 10:
.....: i += random.choice(range(4))
.....: print(i)
.....:
0 0 2 3 4 6 8 8 8 9 12
Similarities¶
Both for
and while
loops can use break
and continue
for
internal flow control.
Both for
and while
loops can have an optional else
block
In both loops, the statements in the else
block are only executed if the
loop terminates normally (no break
)
Lightning Talk¶
Tom Gaffney
Strings¶
Quick review: a string literal creates a string type:
"this is a string"
'So is this'
"And maybe y'all need somthing like this!"
"""and this also"""
You can also use str()
In [256]: str(34)
Out[256]: '34'
String Manipulations¶
split
and join
:
In [167]: csv = "comma, separated, values"
In [168]: csv.split(', ')
Out[168]: ['comma', 'separated', 'values']
In [169]: psv = '|'.join(csv.split(', '))
In [170]: psv
Out[170]: 'comma|separated|values'
Case Switching¶
In [171]: sample = 'A long string of words'
In [172]: sample.upper()
Out[172]: 'A LONG STRING OF WORDS'
In [173]: sample.lower()
Out[173]: 'a long string of words'
In [174]: sample.swapcase()
Out[174]: 'a LONG STRING OF WORDS'
In [175]: sample.title()
Out[175]: 'A Long String Of Words'
Testing¶
In [181]: number = "12345"
In [182]: number.isnumeric()
Out[182]: True
In [183]: number.isalnum()
Out[183]: True
In [184]: number.isalpha()
Out[184]: False
In [185]: fancy = "Th!$ $tr!ng h@$ $ymb0l$"
In [186]: fancy.isalnum()
Out[186]: False
String Literals¶
Common Escape Sequences:
\\ Backslash (\)
\a ASCII Bell (BEL)
\b ASCII Backspace (BS)
\n ASCII Linefeed (LF)
\r ASCII Carriage Return (CR)
\t ASCII Horizontal Tab (TAB)
\ooo Character with octal value ooo
\xhh Character with hex value hh
for example – for tab-separted values:
In [25]: s = "these\tare\tseparated\tby\ttabs"
In [26]: print(s)
these are separated by tabs
https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals https://docs.python.org/3/library/stdtypes.html#string-methods
Raw Strings¶
Add an r
in front of the string literal:
Escape Sequences Ignored
In [408]: print("this\nthat")
this
that
In [409]: print(r"this\nthat")
this\nthat
Gotcha
In [415]: r"\"
SyntaxError: EOL while scanning string literal
Ordinal values¶
Characters in strings are stored as numeric values:
- “ASCII” values: 1-127
- Unicode values – 1 - 1,114,111 (!!!)
To get the value:
In [109]: for i in 'Chris':
.....: print(ord(i), end=' ')
67 104 114 105 115
In [110]: for i in (67,104,114,105,115):
.....: print(chr(i), end='')
Chris
(these days, stick with ASCII, or use full Unicode: more on that in a few weeks)
Old and New string formatting¶
back in early python days, there was the string formatting operator: %
" a string: %s and a number: %i "%("text", 45)
This is very similar to C-style string formatting (sprintf).
It’s still around, and handy — but ...
The “new” format()
method is more powerful and flexible, so we’ll focus on that in this class.
The string format()
method:
In [62]: "A decimal integer is: {:d}".format(34)
Out[62]: 'A decimal integer is: 34'
In [63]: "a floating point is: {:f}".format(34.5)
Out[63]: 'a floating point is: 34.500000'
In [64]: "a string is the default: {}".format("anything")
Out[64]: 'a string is the default: anything'
Multiple placeholders:¶
In [65]: "the number is {} is {}".format('five', 5)
Out[65]: 'the number is five is 5'
In [66]: "the first 3 numbers are {}, {}, {}".format(1,2,3)
Out[66]: 'the first 3 numbers are 1, 2, 3'
The counts must agree:
In [67]: "string with {} formatting {}".format(1)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-67-a079bc472aca> in <module>()
----> 1 "string with {} formatting {}".format(1)
IndexError: tuple index out of range
Named placeholders:¶
In [69]: "Hello, {name}, whaddaya know?".format(name="Joe")
Out[69]: 'Hello, Joe, whaddaya know?'
You can use values more than once, and skip values:
In [73]: "Hi, {name}. Howzit, {name}?".format(name='Bob')
Out[73]: 'Hi, Bob. Howzit, Bob?'
The format operator works with string variables, too:
In [80]: s = "{:d} / {:d} = {:f}"
In [81]: a, b = 12, 3
In [82]: s.format(a, b, a/b)
Out[82]: '12 / 3 = 4.000000'
So you can dynamically build a format string
Complex Formatting¶
There is a complete syntax for specifying all sorts of options.
It’s well worth your while to spend some time getting to know this formatting language. You can accomplish a great deal just with this.
One Last Trick¶
For some of the exercises, you’ll need to interact with a user at the command line.
There’s a nice built in function to do this - input
:
In [85]: fred = input('type something-->')
type something-->I've typed something
In [86]: print(fred)
I've typed something
This will display a prompt to the user, allowing them to input text and allowing you to bind that input to a symbol.
Fun with strings¶
Rewrite:
the first 3 numbers are: %i, %i, %i"%(1,2,3)
- for an arbitrary number of numbers...
Dictionaries¶
Python calls it a dict
Other languages call it:
- dictionary
- associative array
- map
- hash table
- hash
- key-value pair
Dictionary Constructors¶
>>> {'key1': 3, 'key2': 5}
{'key1': 3, 'key2': 5}
>>> dict([('key1', 3),('key2', 5)])
{'key1': 3, 'key2': 5}
>>> dict(key1=3, key2= 5)
{'key1': 3, 'key2': 5}
>>> d = {}
>>> d['key1'] = 3
>>> d['key2'] = 5
>>> d
{'key1': 3, 'key2': 5}
Dictionary Indexing¶
>>> d = {'name': 'Brian', 'score': 42}
>>> d['score']
42
>>> d = {1: 'one', 0: 'zero'}
>>> d[0]
'zero'
>>> d['non-existing key']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'non-existing key'
Keys can be any immutable:
- number
- string
- tuple
In [325]: d[3] = 'string'
In [326]: d[3.14] = 'pi'
In [327]: d['pi'] = 3.14
In [328]: d[ (1,2,3) ] = 'a tuple key'
In [329]: d[ [1,2,3] ] = 'a list key'
TypeError: unhashable type: 'list'
Actually – any “hashable” type.
Hash functions convert arbitrarily large data to a small proxy (usually int)
Always return the same proxy for the same input
MD5, SHA, etc
Dictionaries hash the key to an integer proxy and use it to find the key and value.
Key lookup is efficient because the hash function leads directly to a bucket with very few keys (often just one)
What would happen if the proxy changed after storing a key?
Hashability requires immutability
Key lookup is very efficient
Same average time regardless of size
Note: Python name look-ups are implemented with dict – it’s highly optimized
Key to value:
- lookup is one way
Value to key:
- requires visiting the whole dict
If you need to check dict values often, create another dict or set
(up to you to keep them in sync)
Dictionary Ordering (not)¶
Dictionaries have no defined order
In [352]: d = {'one':1, 'two':2, 'three':3}
In [353]: d
Out[353]: {'one': 1, 'three': 3, 'two': 2}
In [354]: d.keys()
Out[354]: dict_keys(['three', 'two', 'one'])
Dictionary Iterating¶
for
iterates over the keys
In [15]: d = {'name': 'Brian', 'score': 42}
In [16]: for x in d:
print(x)
....:
score
name
(note the different order...)
dict keys and values¶
In [20]: d = {'name': 'Brian', 'score': 42}
In [21]: d.keys()
Out[21]: dict_keys(['score', 'name'])
In [22]: d.values()
Out[22]: dict_values([42, 'Brian'])
In [23]: d.items()
Out[23]: dict_items([('score', 42), ('name', 'Brian')])
dict keys and values¶
Iterating on everything
In [26]: d = {'name': 'Brian', 'score': 42}
In [27]: for k, v in d.items():
print("%s: %s" % (k,v))
....:
score: 42
name: Brian
Dictionary Performance¶
- indexing is fast and constant time: O(1)
x in s
constant time: O(1)- visiting all is proportional to n: O(n)
- inserting is constant time: O(1)
- deleting is constant time: O(1)
Other dict operations:¶
See them all here:
https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
Is it in there?
In [5]: d
Out[5]: {'that': 7, 'this': 5}
In [6]: 'that' in d
Out[6]: True
In [7]: 'this' not in d
Out[7]: False
Containment is on the keys.
Getting something: (like indexing)
In [9]: d.get('this')
Out[9]: 5
But you can specify a default
In [11]: d.get('something', 'a default')
Out[11]: 'a default'
Never raises an Exception (default default is None)
iterating
In [13]: for item in d:
....: print(item)
....:
this
that
which is equivalent to, but faster than:
In [15]: for key in d.keys():
print(key)
....:
this
that
but to get values, must specify you want values:
In [16]: for val in d.values():
print(val)
....:
5
7
“Popping”: getting the value while removing it
pop out a particular key
In [19]: d.pop('this')
Out[19]: 5
In [20]: d
Out[20]: {'that': 7}
pop out an arbitrary key, value pair
In [23]: d.popitem()
Out[23]: ('that', 7)
In [24]: d
Out[24]: {}
This one is handy:
setdefault(key[, default])
gets the value if it’s there, sets it if it’s not
In [27]: d.setdefault('something', 'a value')
Out[27]: 'a value'
In [28]: d
Out[28]: {'something': 'a value'}
Assignment maintains link to the original dict
In [47]: d
Out[47]: {'something': 'a value'}
In [48]: item_view = d
In [49]: d['something else'] = 'another value'
In [50]: item_view
Out[50]: {'something': 'a value', 'something else': 'another value'}
Use explicit copy method to get a copy
In [51] item_copy = d.copy()
In [52]: d['another thing'] = 'different value'
In [53]: d
Out[53]:
{'another thing': 'different value',
'something': 'a value',
'something else': 'another value'}
In [54]: item_copy
Out[54]: {'something': 'a value', 'something else': 'another value'}
Sets¶
set
is an unordered collection of distinct values
Essentially a dict with only keys
Set Constructors
>>> set()
set()
>>> set([1, 2, 3])
{1, 2, 3}
>>> {1, 2, 3}
{1, 2, 3}
>>> s = set()
>>> s.update([1, 2, 3])
>>> s
{1, 2, 3}
Set Properties¶
Set
members must be hashable
Like dictionary keys – and for same reason (efficient lookup)
No indexing (unordered)
>>> s[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'set' object does not support indexing
Set Methods¶
>> s = set([1])
>>> s.pop() # an arbitrary member
1
>>> s.pop()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'pop from an empty set'
>>> s = set([1, 2, 3])
>>> s.remove(2)
>>> s.remove(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 2
All the “set” operations from math class...
s.isdisjoint(other)
s.issubset(other)
s.union(other, ...)
s.intersection(other, ...)
s.difference(other, ...)
s.symmetric_difference( other, ...)
Frozen Set¶
Another kind of set: frozenset
immutable – for use as a key in a dict (or another set...)
>>> fs = frozenset((3,8,5))
>>> fs.add(9)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
Homework¶
Handy hints for/from Homework¶
You almost never need to loop through the indexes of a sequence
nifty for loop tricks¶
tuple unpacking:
remember this?
x, y = 3, 4
You can do that in a for loop, also:
In [4]: l = [(1, 2), (3, 4), (5, 6)]
In [5]: for i, j in l:
print("i:%i, j:%i"%(i, j))
i:1, j:2
i:3, j:4
i:5, j:6
(Mailroom example)
Looping through two loops at once:¶
zip
In [10]: l1 = [1, 2, 3]
In [11]: l2 = [3, 4, 5]
In [12]: for i, j in zip(l1, l2):
....: print("i:%i, j:%i"%(i, j))
....:
i:1, j:3
i:2, j:4
i:3, j:5
Can be more than two:
for i, j, k, l in zip(l1, l2, l3, l4):
Need the index and the item?¶
enumerate
In [2]: l = ['this', 'that', 'the other']
In [3]: for i, item in enumerate(l):
...: print("the %ith item is: %s"%(i, item))
...:
the 0th item is: this
the 1th item is: that
the 2th item is: the other
Homework Comments¶
Building up a long string.
The obvious thing to do is something like:
msg = ""
for piece in list_of_stuff:
msg += piece
But: strings are immutable – python needs to create a new string each time you add a piece – not efficient:
msg = []
for piece in list_of_stuff:
msg.append(piece)
" ".join(msg)
appending to lists is efficient – and so is the join() method of strings.
You can put a mutable item in an immutable object!
sum()
functionWhat is assert
for?
Testing – NOT for issues expected to happen operationally:
assert m >= 0
in operational code should be:
if m < 0:
raise ValueError
I’ll cover next week ...
(Asserts get ignored if optimization is turned on!)
Recommended Reading:¶
- Dive Into Python: Chapt. 13,14
Assignments:¶
- Finish lab exercises
- Coding kata: trigrams
- Paths and files
- Update mailroom with dicts and exceptions
Text and files and dicts, and...¶
Coding Kata 14 - Dave Thomas
Use The Adventures of Sherlock Holmes as input:
This is intentionally open-ended and underspecified. There are many interesting decisions to make.
Experiment with different lengths for the lookup key. (3 words, 4 words, 3 letters, etc)