Dictionaries and Sets
Dictionary
Python calls it a dict
Other languages call it:
dictionary
associative array
map
hash table
hash
key-value pair
It is also known in Python as a “mapping”, as it “maps” keys to values.
It is like an array, in that it holds a number of items, and you can index into it to get at particular items. But it can use arbitrary indexes, rather than a sequence of numbers.
These indexes are called “keys”, and the items stored are called “values”.
So for any Python sequence, you might do:
item = stuff[3]
for a dict, you do:
item = stuff["third"]
Note
It is very common to use strings as keys in a dictionary. So common that virtually all examples you’ll see both here and on the Internet use strings as keys. And many other languages have similar objects that only allow strings (like JavaScript objects, for instance). Python dicts, on the other hand, can use many types as keys, and this can be a very powerful feature to keep in mind.
Dictionary Constructors
So how does one make a dict? There are a few ways…
The dict “literal”:
# This dict "lies"
>>> {'key1': 3, 'key2': 5}
{'key1': 3, 'key2': 5}
Calling the dict type object constructor:
# passing in a sequence of (key,value) pairs:
>>> dict([('key1', 3), ('key2', 5)])
{'key1': 3, 'key2': 5}
# passing keys and values as keyword arguments:
>>> dict(key1=3, key2= 5)
{'key1': 3, 'key2': 5}
# creating an empty dict, and then populating it:
>>> d = {}
>>> d['key1'] = 3
>>> d['key2'] = 5
>>> d
{'key1': 3, 'key2': 5}
Dictionary Indexing
And how do you get stuff out (index it)?
The same way you index a sequence, except the index (now called a key) can be various types, rather than just an integer:
>>> d = {'name': 'Brian', 'score': 42}
>>> d['score']
42
>>> d = {1: 'one', 0: 'zero'}
>>> d[0]
'zero'
What if the key doesn’t exist in the dict? You get a KeyError
exception:
>>> d['non-existing key']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'non-existing key'
What can keys be?
Surely not ANYTHING?
Not quite: keys can be any “immutable”:
number
string
tuple
In [325]: d[3] = 'string'
In [326]: d[3.14] = 'pi'
In [327]: d['pi'] = 3.14
In [328]: d[ (1,2,3) ] = 'a tuple key'
What if you try to use a mutable type?
In [329]: d[ [1,2,3] ] = 'a list key'
TypeError: unhashable type: 'list'
Actually – any “hashable” type.
So, technically, it’s not mutability, but hashability that matters.
Although for most intents and purposes, you want to use immutable types as keys in dicts.
Hashing
What does “hashable” mean? (https://computersciencewiki.org/index.php/Hashing)
Hash functions convert arbitrarily large data to a small proxy (usually an integer).
They always return the same proxy for the same input.
MD5, SHA, etc, are some well known hash algorithms.
Dictionaries hash the key to an integer proxy and use it to find the key and value.
Key lookup is efficient because the hash function leads directly to a bucket with very few keys (often just one).
What would happen if the proxy (hash) changed after storing a key?
(Answer: you wouldn’t be able to find it again!)
Hashability requires immutability.
Key lookup is very efficient:
The access time is constant regardless of the size of the dict.
Dictionary indexing
Note: cPython name look-ups are implemented with dicts – it’s highly optimized.
Key to value:
lookup is one way.
Value to key:
requires visiting the whole dict.
If you need to check dict values often, create another dict or set.
(But note that it’s then up to you to keep them in sync).
Dictionary Ordering (not)
Traditionally, dictionaries have had no defined order. See this example from Python 3.5:
In [352]: d = {'one':1, 'two':2, 'three':3}
In [353]: str(d)
Out[353]: {'one': 1, 'three': 3, 'two': 2}
In [354]: d.keys()
Out[354]: dict_keys(['three', 'two', 'one'])
Note how I defined the dict in a natural order, but when it gets printed, or you display the keys, they are in a different order.
However, In cPython 3.6, the internal implementation was changed, and it does happen to preserve order. In cPython 3.6, that is considered an implementation detail – and you should not count on it! However, as of cPython 3.7, dictionaries preserving order are part of the language specification. This was declared by Guido on the python-dev mailing list on Dec 15, 2017 <https://mail.python.org/pipermail/python-dev/2017-December/151283.html>.
In Python 3.6, the above code results in:
In [9]: d = {'one':1, 'two':2, 'three':3}
In [10]: str(d)
Out[10]: "{'one': 1, 'two': 2, 'three': 3}"
In [11]: d.keys()
Out[11]: dict_keys(['one', 'two', 'three'])
When new items are added to a dict, they go on the “end”:
In [12]: d = {}
In [13]: d['one'] = 1
In [14]: d['two'] = 2
In [15]: d['three'] = 3
In [16]: str(d)
Out[16]: "{'one': 1, 'two': 2, 'three': 3}"
and dict.popitem()
will remove the “last” item in the dict.
CAUTION This is new behavior in cPython 3.6 – older versions of Python (notably including Python 2) do not preserve order. In older versions, there is a special version of a dict in the collections module: collections.OrderedDict
which preserves order in all versions of Python, and has a couple extra features.
Also: while Python dicts now preserve order, they are not really a fully ordered object: there is no direct way to get, say, the “third” item in a dict, or to inset an item at a particular location.
Dictionary Iterating
for
iterates over the keys
In [23]: d = {'name': 'Brian', 'score': 42}
In [24]: for x in d:
...: print(x)
...:
name
score
dict keys and values
In [25]: d = {'name': 'Brian', 'score': 42}
In [26]: d.keys()
Out[26]: dict_keys(['name', 'score'])
In [27]: d.values()
Out[27]: dict_values(['Brian', 42])
In [28]: d.items()
Out[28]: dict_items([('name', 'Brian'), ('score', 42)])
Notice that these are of type dict_keys
and dict_values
. These are special types that provide iteration, printing and other features, but are tied to the underlying dict, rather than copies. They are actually Sets (see below), so can be compared to sets, and the in
operator is efficient.
(Python2 would simply create lists of keys and values – but then you were making a copy when you probably didn’t need one). If you do need a copy, or a proper Sequence, simmply wrap them in a list()
:
the_values_as_a_list = list(a_dict.values())
dict keys and values
Iterating on everything
In [26]: d = {'name': 'Brian', 'score': 42}
In [27]: for k, v in d.items():
print("%s: %s" % (k,v))
....:
name: Brian
score: 42
Dictionary Performance
Indexing is fast and constant time: O(1).
x in s
is fast and constant time: O(1).Visiting all items is proportional to n: O(n).
Inserting is constant time: O(1).
Deleting is constant time: O(1).
Other dict operations:
See them all here:
https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
Is it in there?
In [5]: d
Out[5]: {'that': 7, 'this': 5}
In [6]: 'that' in d
Out[6]: True
In [7]: 'this' not in d
Out[7]: False
Containment is on the keys.
Think of it like a “real” dictionary, where the keys are the words, and the values are the definitions.
Is the word “gullible” in the dictionary? is asking if the key is in the dict.
Getting something: (like indexing)
In [9]: d.get('this')
Out[9]: 5
But you can specify a default:
In [11]: d.get('something', 'a default')
Out[11]: 'a default'
get() never raises an Exception (default is None).
iterating
In [13]: for item in d:
....: print(item)
....:
this
that
Which is equivalent to, but a bit faster than:
In [15]: for key in d.keys():
print(key)
....:
this
that
In fact, there are very few things you can do with the dict_keys
that you can’t do directly with the dict.
But to get values, you must specify you want the values:
In [16]: for val in d.values():
print(val)
....:
5
7
and to get both, you use .items
:
In [4]: for item in d.items():
...: print(item)
...:
('this', 5)
('that', 7)
pop()
“Popping”: getting the value while removing it.
Pop out a particular key:
In [19]: d.pop('this')
Out[19]: 5
In [20]: d
Out[20]: {'that': 7}
pop out an arbitrary key, value pair
In [23]: d.popitem()
Out[23]: ('that', 7)
In [24]: d
Out[24]: {}
note that it’s the last item, so not completely arbitrary.
setdefault
This one is handy:
setdefault(key[, default])
gets the value if it’s there, sets it to the specified default if it’s not. Returns the value in either case.
In [4]: d = {}
In [5]: d.setdefault('something', 'a value')
Out[5]: 'a value'
In [6]: d
Out[6]: {'something': 'a value'}
The next time you call it, it gets the already set value:
In [7]: d.setdefault('something', 'a different value')
Out[7]: 'a value'
Assignment
Assignment (with =
) is a link to the original dict, just like lists or anything else. Remember that assignment is simply binding a new name to something.
And dicts are mutable – so be careful!
In [47]: d
Out[47]: {'something': 'a value'}
In [48]: item_view = d
In [49]: d['something else'] = 'another value'
In [50]: item_view
Out[50]: {'something': 'a value', 'something else': 'another value'}
If you want a copy, use the explicit copy method to get a copy:
In [51] item_copy = d.copy()
In [52]: d['another thing'] = 'different value'
In [53]: d
Out[53]:
{'another thing': 'different value',
'something': 'a value',
'something else': 'another value'}
In [54]: item_copy
Out[54]: {'something': 'a value', 'something else': 'another value'}
or pass any mapping into the dict constructor:
.. code-block:: python
new_dict = dict(old_dict)
Sets
a set
is an unordered collection of distinct values.
Essentially, a set
is a dict with only keys.
https://docs.python.org/3.8/library/stdtypes.html#set-types-set-frozenset
Set Constructors:
>>> set()
set()
>>> set([1, 2, 3])
{1, 2, 3}
>>> {1, 2, 3}
{1, 2, 3}
>>> s = set()
>>> s.update([1, 2, 3])
>>> s
{1, 2, 3}
Set Properties
Set
members must be hashable, like dictionary keys – and for same reason (efficient lookup).
No indexing (unordered).
>>> s[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'set' object does not support indexing
Set Methods
>> s = set([1])
>>> s.pop() # an arbitrary member
1
>>> s.pop()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'pop from an empty set'
>>> s = set([1, 2, 3])
>>> s.remove(2)
>>> s.remove(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 2
All the “set” operations from math class…
s.isdisjoint(other)
s.issubset(other)
s.union(other, ...)
s.intersection(other, ...)
s.difference(other, ...)
s.symmetric_difference( other, ...)
Set Operators
All the basic set operations are support with math-class like operators:
Test whether every element in the set is in other:
set <= other
Test whether the set is a proper subset of other, that is, set <= other and set != other:
set < other
Test whether every element in other is in the set:
set >= other
Test whether the set is a proper superset of other, that is, set >= other and set != other
:
set > other
Union: Return a new set with elements from the set and all others:
set | other | ...
Intersection: Return a new set with elements common to the set and all others:
set & other & ...
Difference: Return a new set with elements in the set that are not in the others:
set - other - ...
Symmetric difference: return a new set with elements in either the set or other but not both:
set ^ other
In fact, it is the operator versions that make the set
object “officially” a Set: (Set ABC)
Frozen Set
Another kind of set: frozenset
immutable – for use as a key in a dict (or another set…):
>>> fs = frozenset((3,8,5))
>>> fs.add(9)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
A few added notes:
The count() method
All Python sequences (including strings) have a count()
method:
In [1]: s = "This is an arbitrary string"
In [2]: s.count('t')
Out[2]: 2
What if you want a case-insensitive count?
In [3]: s.lower().count('t')
Out[3]: 3
set.update()
If you want to add a bunch of stuff to a set, you can use update:
In [1]: s = set()
In [2]: s.update
Out[2]: <function set.update>
In [3]: s.update(['this', 'that'])
In [4]: s
Out[4]: {'that', 'this'}
In [5]: s.update(['this', 'thatthing'])
In [6]: s
Out[6]: {'that', 'thatthing', 'this'}
NOTE: It’s VERY often the case that when you find yourself writing a trivial loop – there is a way to do it with a built in method!
Sorting stuff in dictionaries:
dicts aren’t sorted, so what if you want to do something in a sorted way?
The “standard” way:
for key in sorted(d.keys()):
...
As dicts DO preserve order, you can make a sorted version of a dict:
sorted_dict = dict(sorted(dict.items(), key=sort_key))
where sort_key is a function that takes the (key, value) tuple and returns a value to sort on.
Another option:
collections.OrderedDict
Also other nifty stuff in the collections
module:
https://docs.python.org/3.6/library/collections.html
NOTE: As of Python 3.6, dicts do preserve order. But they are not full featured ordered objects. If you want a “properly” ordered object, use OrderedDict
.