Abstract Base Classes and standard class protocols¶
( And other built in types (array, bytes, etc.,) )
Interfaces¶
An “Interface” (for a class) is the set of attributes and methods available.
Python has traditionally used a very dynamic definition of interface:
“Duck typing”
If is quacks like a duck....
This is very flexible – a class does not have to provide everything that a given interface might require – only the bit that is needed.
For instance, a number of functions in the Python standard library act on a “file-like object”.
That means “an object that follows the file interface”. However, it is quite rare that any given function needs the entire interface – maybe it only needs to read something – so only needs one method:
some_bytes = obj.read()
In this case, rather than checking if the object conforms to the file interface, this is a great use case for EAFP:
“Easier to ask Forgiveness than Permission”
That is – simply call .read()
, and catch the AttributError.
Or simply let it fail if there is no read() method.
Protocols¶
“Protocol” is essentially just another word for Interface. It tends to be used when there is a bit more action – such as the “Iterator Protocol”.
The Iterator protocol specifies not just methods and attributes, but actions:
“A StopIteration Exception is raised when the iterator is exhausted”
But it’s really just another word for the same thing. So in Python, interfaces are talked about in (at least) for ways:
- Interface
- Protocol
- -like – as in “file-like”
- generic name – like “sequence”
Abstract Base Classes¶
Sometimes it is nice to more formally define an interface.
Abstract Base Classes are a way to define a specific interface. In particular, the protocols of the standard Python object types.
They were added to Python to formalize the “Duck Typing” approach in some contexts. See PEP 3119 for the details:
https://www.python.org/dev/peps/pep-3119/
In OO languages, an “abstract” class is a class that has an interface, but no implementation. These are sometimes required by static OO languages to provide a way to have more than one implementation of a given protocol.
In Python, they can be used as a way to know if a given object is what you might be expecting – is it a Sequence? a Mapping?
You can directly check if an object is a particular type:
In [117]: isinstance([1,2,3], list)
Out[117]: True
In [118]: isinstance([1,2,3], tuple)
Out[118]: False
But that is not the “Python way” – we usually don’t care if the passed in object is a list, or a tuple, or any other sequence type. What we care about is if is Any Sequence (or maybe only any Iterable).
Enter ABCs:
from collections.abc import Sequence
In [121]: isinstance([1,2,3,4], Sequence)
Out[121]: True
In [122]: isinstance((1,2,3,4), Sequence)
Out[122]: True
But what if it needs to mutable?
In [123]: from collections.abc import MutableSequence
In [124]: isinstance([1,2,3,4], MutableSequence)
Out[124]: True
In [125]: isinstance((1,2,3,4), MutableSequence)
Out[125]: False
So Python
gotcha!¶
One of the most common type errors seen in Python is the “sequence of strings” vs “single string” confusion:
It is moderately common to write a function that acts on a bunch of strings – perhaps a bunch of filenames.
def do_something(filenames):
"""
process a number of files.
:param filenames: sequence of filenames to process
"""
for name in filenames:
print("processing:", name)
with open(name) as the_file:
process(the_file)
Pass it a list of strings, and all is good:
do_something(["name1", "name2", "third_name"])
processing: name1
processing: name2
processing: third_name
But what if the user has only one file to process?
In [4]: do_something("a_single_name")
processing: a
processing: _
processing: s
processing: i
processing: n
processing: g
processing: l
processing: e
processing: _
processing: n
processing: a
processing: m
processing: e
OOPS! – this happens all too often.
Why?
Well, the function is expecting an iterable of strings that it can loop over.
But a string IS an iterable of strings! It iterates over the string, yielding each letter. And a letter is simply a string that happens to be of length-one.
So plain old duck-typing and EAFP doesn’t work.
what about checking for a Sequence?
def do_something2(filenames):
"""
process a number of files.
:param filenames: sequence of filenames to process
"""
if not isinstance(filenames, abc.Iterable):
# put it in a list so the rest of the code is the same.
filenames = [filenames]
for name in filenames:
print("processing:", name)
DARN! that didn’t work – of course, a string is Iterable.
So you need to do it the other way around: check for a string. But there is no ABC for strings – there is only one string object in Python, and it’s unlikely that a third party lib will write an alternate implementation.
So you check for string explicitly:
def do_something3(filenames):
"""
process a number of files.
:param filenames: sequence of filenames to process
"""
if isinstance(filenames, str):
# put it in a tuple so the rest of the code is the same.
filenames = (filenames,)
for name in filenames:
print("processing:", name)
Ahh – that works:
In [10]: do_something3("name")
processing: name
In [11]: do_something3(["name", "name2", "name3"])
processing: name
processing: name2
processing: name3
Things you might do with ABCs¶
(In order of likelihood)
- Do an
isinstance()
check at the top of a function. - Create a class that conforms to one of the standard ABCs and register it.
- Write an ABC that specifies a particular subset of an interface for your use-case
- Write a full-fledged ABC for a whole new Interface. (very, very, unlikely...)
Using isinstance with standard ABCs¶
This you would do at the top of a funciton to make sure you are working with what you think you are.
Remember that most of the time you can simply use EAFP instead.
But if you do – choose the ABC that is at the lowest level for your use case.
That is, don’t use MutableSequence
if it does not need to be mutable.
And if you only need to iterate over it – use Iterable
Most of the build-in ones are in the collections.abc
module:
for name in dir(collections.abc):
if not name.startswith('_'):
print(name)
AsyncIterable
AsyncIterator
Awaitable
ByteString
Callable
Container
Coroutine
Generator
Hashable
ItemsView
Iterable
Iterator
KeysView
Mapping
MappingView
MutableMapping
MutableSequence
MutableSet
Sequence
Set
Sized
ValuesView
Creating an object that matches an interface¶
Let’s say you write your own container that is a MutableSequence
.
How do you let others know that your object can be used anywhere a
MutableSequence
(or a Sequence
, or an Iterable
) can be used?
You can “register” your class with the ABC:
Every ABC has a register method that can serve as a decorator:
import collections.abc
@collections.abc.Sequence.register
class MyKindOfSequence():
def __init__(self, an_iterable):
self.contents = list(an_iterable)
def __len__(self):
return len(self.contents)
In [23]: s = MyKindOfSequence(('this', 'that', 4))
In [24]: isinstance(s, MyKindOfSequence)
Out[24]: True
Examples/abcs/register.py
CAUTION: you really should implement the entire interface if you are going to register something!
Semi-ducktyping¶
If you need only a method or two from an object, then just use EAFP:
file_like.read()
for instance.
But what if you need a bunch of methods – but not an explicit particular type?
This was brought up on SO:
http://stackoverflow.com/questions/3450857/python-determining-if-an-object-is-file-like
Alex Martelli’s answer:
“””
... you need to determine which methods you need. To check for them, I
recommended defining your own FileLikeEnoughForMe abstract base class
with abstractmethod
decorators, and checking the object with
isinstance
for that class.
“”“
Note You could use isinstance(obj, io.FileIO)
, but then you
would only be able to use object that conformed to the full File interface.
Here’s an example:
How to write an ABC¶
The abc module provides tools to help write an ABC.
Let’s look at an example:
Examples/abcs/file_like.py
import abc
# deriving from ABC makes it, well, an ABC :-)
class FileLike(abc.ABC):
@abc.abstractmethod
def read(self):
"""read the contents of the file"""
@abc.abstractmethod
def write(self, text):
""" write stuff to the file """
@abc.abstractmethod
def readlines(self):
"""read the lines of the file into a list"""
@classmethod
def __subclasshook__(cls, inst):
if cls is FileLike:
print(cls.__dict__.keys())
if (any("read" in B.__dict__ for B in inst.__mro__) and
any("write" in B.__dict__ for B in inst.__mro__) and
any("readlines" in B.__dict__ for B in inst.__mro__)
):
return True
return NotImplemented
subclass from ABC to make it an ABC.
define the abstract methods you need – the
abc.abstractmethod
decorator lets the ABC machinery know the method needs to be implimented by subclasses.define __subclasshook__ – this is the magic that lets isinstance() work for classes that you have not control over.
It is called when isinstance() is called with this as the class. It then checks to see if all the methods expected are in the object passed in.
There could certainly be more magic, but this is explicite, and you could add other logic if you really wanted to.
Using an ABC¶
You use the ABC by passing it to isinstance
:
def use_a_file(f):
"""
Do some arbitrary thing with a file-like object
:param f: an open file-like object
"""
# check if we got the right thing
if not isinstance(f, FileLike):
raise TypeError("you must pass an open file-like object to use_a_file\n"
" A {} does not satisfy the FileLike protocol".format(type(f))
)
print("Hey -- I can use that! -- {}".format(type(f)))
This function will now work with any object that satisfies the ABC, and raise a type error for any object that doesn’t.
Examples:
In [23]: use_a_file("a string")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-23-1da560af8e21> in <module>()
----> 1 use_a_file("a string")
/Users/Chris/PythonStuff/UWPCE/Py300-Spring2017/Examples/abcs/file_like.py in use_a_file(f)
60 if not isinstance(f, FileLike):
61 raise TypeError("you must pass an open file-like object to use_a_file\n"
---> 62 " A {} does not satisfy teh FileLike protocol".format(type(f))
63 )
64 print("Hey -- I can use that! -- {}".format(type(f)))
TypeError: you must pass an open file-like object to use_a_file
A <class 'str'> does not satisfy teh FileLike protocol
Use it with a built-in actual file object:
In [24]: with open("junk", 'a') as a_file:
...: use_a_file(a_file)
...:
Hey -- I can use that! -- <class '_io.TextIOWrapper'>
Making your own implementation of FileLike:¶
To make a File like object, you subclass from the ABC you defined. And define the methods
class MyFileType(FileLike):
def read(self):
"""
read the contents of teh file and return it
this dummy implimentation always returns the same thing.
"""
return "The contents of the file\nNot much here"
now try to use it:
In [26]: my_f = MyFileType()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-26-041b07f27f62> in <module>()
----> 1 my_f = MyFileType()
TypeError: Can't instantiate abstract class MyFileType with abstract methods readlines, write
OOPS – can’t do that! That’s what @abstractmethod
does for you – it requires that an implimentation be provided.
So let’s do that:
class MyFileType(FileLike):
def read(self):
"""
read the contents of teh file and return it
this dummy implimentation always returns the same thing.
"""
return "The contents of the file\nNot much here"
def write(self, content):
"""
this just throws away the content...
"""
pass
def readlines(self):
return ["The contents of the file\n","Not much here"]
and now it works:
In [28]: my_f = MyFileType()
and it can be used with our function:
In [29]: use_a_file(my_f)
Hey -- I can use that! -- <class '__main__.MyFileType'>
A Digression¶
“private” attributes?
A private attribute is an attribute that is not part of the Interface of the class.
While Python does not strictly have private attributes, It’s a common convention that a leading underscore means an attribute that should only be accessed from within a class.
Often used for properties, for example:
class Circle():
...
@property
def diameter(self):
return self._radius * 2
But there is a “stronger” way to “protect” an attribute – double underscores:
In [129]: class Base():
...: __private = "sneaky"
...:
In [130]: Base.__dict__.keys()
Out[130]: dict_keys(['_Base__private', '__module__', '__doc__', '__weakref__', '__dict__'])
note the “_Base” was added to the name.
But if you don’t have double underscores:
In [131]: class Base():
...: public = "wide open"
...: __private = "sneaky"
...:
In [132]: Base.__dict__.keys()
Out[132]: dict_keys(['_Base__private', '__module__', '__doc__', 'public', '__weakref__', '__dict__'])
So what happens if you subclass? Does it get overridden?
In [134]: class Sub(Base):
...: private = "oops! -- overridden?"
...:
In [135]: Sub.__dict__.keys()
Out[135]: dict_keys(['__module__', 'private', '__doc__'])
In [136]: Sub.private
Out[136]: 'oops! -- overridden?'
In [137]: Sub._Base__private
Out[137]: 'sneaky'
so using the double underscores assures that the attribute will not get accidentally overridden in a subclass.
Note: if you now the name mangling, yu can still mess with it ...
So should you do this?
Usually not – it turns out that in practice it’s rare for “private” attributes to get overidden accidentally. So this is not a widely used practice.
Final Thoughts¶
- don’t use
isinstance
and ABCs when EAFP will do – simply make the call you need and catch the exception. - Generally better to call
isinstance
on an ABC than a specific class - Use the most general ABC you can – i.e. “Iterable” rather than “Sequence”
- “Even with ABCs, excessive use of
isinstance
checks may be code smell” – Luciana Ramalho (from “Fluent Python”) - ABCs mostly exist to support multiple implementations of the same interface: lists and tuples, multiple containers, etc.
- The only time you should write a custom ABC is part of a Framework that a lot of people are going to extend – a very rare case!
Helpful write-ups:
https://dbader.org/blog/abstract-base-classes-in-python
http://stackoverflow.com/questions/3570796/why-use-abstract-base-classes-in-python
Other handy built-in sequences¶
Lists and tuples are very widely used in Python – they give you a mutable and an immutable sequence – what else do you need?
There are some special case sequences:
array.array¶
The array
module provides and array object – a mutable sequence of
homogeneous simple numeric types.
https://pymotw.com/3/array/
The type must be declared when the array is created, and then it can only hold that type:
In [60]: import array
In [61]: arr = array.array('i') # i is an integer type)
In [62]: arr.extend((3,4,5))
In [63]: arr
Out[63]: array('i', [3, 4, 5])
In [64]: arr.append(3.45)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-64-b853b72c3c39> in <module>()
----> 1 arr.append(3.45)
TypeError: integer argument expected, got float
Why?
array
supports a wide variety of types, stores the data far more
compactly, and inter operates better with raw binary data.
But numpy
gives you all that, and a lot more – so folks are more likely to
go to numpy if they really need these features.
Binary sequence types¶
Another way to work with binary data is the bytes
and bytearray
objects.
bytes
:- an immutable sequence of integers in the range 0 <= x < 256.
bytearray
:- a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, as well as most methods that the bytes type has
a byte
is essentially the smallest unit of memory the computer can work with.
On virtually all modern hardware, a byte is 8 bits – 2**8 == 256 – so a single
byte is a value between 0 and 255.
You use bytes and bytearray objects to work with binary data – usually you’ll know it if you need it.
But the most common use case is to deal with encoded text:
In [79]: text = "Not ASCII: \u0107 \u0398 \u03A3 \u03BB \uF4A9"
In [80]: text
Out[80]: 'Not ASCII: ć Θ Σ λ \uf4a9'
In [81]: text = "Not ASCII: \u0107 \u0398 \u03A3 \u03BB"
In [82]: text
Out[82]: 'Not ASCII: ć Θ Σ λ'
In [83]: b = text.encode('utf-8')
In [84]: type(b)
Out[84]: bytes
In [85]: b
Out[85]: b'Not ASCII: \xc4\x87 \xce\x98 \xce\xa3 \xce\xbb'
Note that the bytes in the ascii range display as text. This is an artifact of the old days of C and Python2! where strings were byte arrays.
Python also has a literal for a bytes object – that also uses text:
In [92]: b
Out[92]: b'these are bytes'
But the individual elements are treats as integers:
In [93]: b[0]
Out[93]: 116
In [94]: list(b)
Out[94]: [116, 104, 101, 115, 101, 32, 97, 114, 101, 32, 98, 121, 116, 101, 115]
Since bytes are often used for encoded text, they come with a decode
method:
In [100]: b
Out[100]: b'Not ASCII: \xc4\x87 \xce\x98 \xce\xa3 \xce\xbb'
In [101]: b.decode('utf-8')
Out[101]: 'Not ASCII: ć Θ Σ λ'
memoryview
¶
A memoryview
object is a wrapper around a bit of memory – it allows you to
view an manipulate the data underlying some other object without copying it.
Mostly used for binary interchange between different extension data types: like to pass data from a numpy array to an image, for instance:
In [108]: b = bytearray(b'abcdefg')
In [109]: m = memoryview(b)
In [110]: m
Out[110]: <memory at 0x104759048>
In [111]: len(m)
Out[111]: 7
m and b now share the same memoroy block. So if I change m...
In [113]: m[2:4] = b'\x00\x00'
In [114]: m
Out[114]: <memory at 0x104759048>
In [115]: b
Out[115]: bytearray(b'ab\x00\x00efg')
b is changed as well.
memoryview
can represent different data types, and even
multi-dimensional arrays.
For more info: