Python Memory Management and Weak References¶
Chris Barker
PythonCHB@gmail.com
https://github.com/PythonCHB
Memory Management¶
- You don’t want python objects that are no longer in use taking up memory.
- You don’t want to keep track of all that yourself.
- Most “scripting languages” or “virtual machines” have some sort of automated memory management
Many ways to do “Garbage Collection”
Reference Counting¶
- How memory is managed is not part of the Python language spec:
- Jython uses the JVM
- Iron Python uses the CLR - Both are garbage collected
- PyPy uses Minimark
- The CPython interpreter uses a reference counting scheme:
- Every time there is a new reference to a Python object, its reference count is increased
- Every time a reference is removed – the count is decreased
- When the reference count goes to zero: the object is deleted (memory freed)
What makes a reference?¶
Binding to a name:
x = an_object
Putting it in a container:
l.append(an_object)
Passing it to a function:
func(an_object)
Most of the time, you don’t need to think about this at all.
How do I see what’s going on?¶
import sys
sys.getrefcount(object)
NOTE: This will always return one more than you’d expect, as passing the object to the function increases its refcount by one:
In [5]: a = []
In [6]: sys.getrefcount(a)
Out[6]: 2
- The Heisenberg Uncertainty Principle:
- you can’t observe it without altering it
Playing with References¶
(live demo)
In [7]: a = []
In [8]: sys.getrefcount(a)
Out[8]: 2
In [9]: b = a
In [10]: sys.getrefcount(a)
Out[10]: 3
In [11]: l = [1,2,3,a]
In [12]: sys.getrefcount(a)
Out[12]: 4
In [13]: del b
In [14]: sys.getrefcount(a)
Out[14]: 3
In [15]: del l
In [16]: sys.getrefcount(a)
Out[16]: 2
# function local variables
In [17]: def test(x):
....: print "x has a refcount of:", sys.getrefcount(x)
....:
In [18]: sys.getrefcount(a)
Out[18]: 2
In [19]: test(a)
x has a refcount of: 4
In [20]: sys.getrefcount(a)
Out[20]: 2
In [21]: x = 3
In [22]: sys.getrefcount(x)
Out[22]: 428
WHOA!!
(hint: interning....)
The Power of Reference Counting¶
You don’t need to think about it most of the time.
Code that creates objects doesn’t need to delete them
Objects get deleted right away
. They can “clean up” on deletion (files, for instance) – and it will happen right away.
Performance is predictable
The Limits of Reference Counting¶
- Performance overhead on all operations. But the big one:
Circular references
If a python object somehow references itself – i.e. it references another object that references the first object:
You have a circular reference ...
Circular References¶
In [8]: l1 = [1,] ; l2 = [2,]
In [9]: l1.append(l2); l2.append(l1)
In [10]: l1
Out[10]: [1, [2, [...]]]
In [11]: l2
Out[11]: [2, [1, [...]]]
In [12]: l1[1]
Out[12]: [2, [1, [...]]]
In [13]: l2[1][1][1]
Out[13]: [1, [2, [...]]]
(demo) – simple_circular.py
The Garbage Collector¶
As of Python 2.0 – a garbage collector was added.
It can find and clean up “unreachable” references.
It is turned on by default:
In [1]: import gc
In [2]: gc.isenabled()
Out[2]: True
or you can force it:
In [4]: gc.collect()
Out[4]: 64
But it can be slow, and doesn’t always work!
How does the garbage collector work?
- Not a full “mark and sweep” type.
It searches for reference cycles – then cleans those up.
- It doesn’t have to bother checking non-container types (ints, strings, etc.)
- Faster, and not as dependent on having a clear “root” namespace.
Details here:
http://arctrix.com/nas/python/gc/ (or in the source!)
Big issue: classes that define a __del__
method are not cleaned up.
__del__
methods often act on references that may no be there if they are cleaned up in the wrong order.
NOTE: you can work with gc.garbage() – but tricky and messy
Tools¶
If these objects are no longer “reachable” – how do you find out what’s going on?
We saw sys.getrefcount()
– but you need a reference to the object to use it.
You can see what the refcount is before you delete the last reference, but that isn’t always easy.
Process Memory Use¶
A really coarse way to find a memory leak is to see if the process memory is growing.
It can be subtle –python (and the OS) do tricks to re-use memory, etc.
But if you have a “real” leak – you’ll see it. (Example to follow)
provides functions that report the memory use of the current running process.
(*nix and Windows code)
id checks¶
As it happens, the Python id()
function returns a memory address.
It’s really dangerous, but that means we can examine an object if we know its id, even if we don’t hold a reference to it.
Bill Bumgarner wrote a nifty extension module that returns the python object pointed to by an id (memory address) – “di”:
http://www.friday.com/bbum/2007/08/24/python-di/
I added a function that returns the reference count of an object from its id.
https://github.com/PythonCHB/di_refcount
NOTE: it would be a really bad idea to use these in production code!
Examples¶
uses the ref_by_id() function to see what’s going on with a circular reference and garbage collection.
More real examples in iPython notebook:
CircularReferenceExample.ipynb
Or: circular.py
memcount.py
is a test
file that show memory growth if circular references are not cleaned up.
( mem_check.py
)
is code that reports process memory use.
You can find this code in the main repo here:
https://github.com/UWPCE-PythonCert/SystemDevelopment2015/tree/master/Examples/ref_counting
Weak References¶
For times when you don’t want to keep objects alive, Python provides “weak references” – we saw this in the examples.
(https://docs.python.org/2/library/weakref.html)
- The built-in containers:
WeakKeyDictionary
WeakValueDictionary
WeakSet
Proxy
objects
- act much like regular references – client code doesn’t know the difference
WeakRef
objects
- When you want to control what happens when the referenced object is gone.
Exercise¶
Build a “weak cache”:
For large objects that are expensive to create:
- Use a WeakValueDictionay to hold references to (probably large) objects.
- When the client requests an object that doesn’t exist – one is created, returned, and cached (weakly).
- If the object is in the cache, it is returned.
- when no other references exist to the object, it is NOT retained by the cache.