Python Memory Management and Weak References

Adapted from Chris Barker’s materials

Memory Management

  • You don’t want python objects that are no longer in use taking up memory.
  • You don’t want to keep track of all that yourself.
  • Most “scripting languages” or “virtual machines” have some sort of automated memory management

Many ways to do “Garbage Collection”

Reference Counting

How memory is managed is not part of the Python language spec:
  • Jython uses the JVM
  • Iron Python uses the CLR - Both are garbage collected
  • PyPy uses Minimark
The CPython interpreter uses a reference counting scheme:
  • Every time there is a new reference to a Python object, its reference count is increased
  • Every time a reference is removed – the count is decreased
  • When the reference count goes to zero: the object is deleted (memory freed)

What makes a reference?

  • Binding to a name:

    x = an_object
    
  • Putting it in a container:

    l.append(an_object)
    
  • Passing it to a function:

    func(an_object)
    

Most of the time, you don’t need to think about this at all.

How do I see what’s going on?

import sys
sys.getrefcount(object)

Note

This will always return one more than you’d expect, as passing the object to the function increases its refcount by one:

In [5]: a = []

In [6]: sys.getrefcount(a)
Out[6]: 2
The Heisenberg Uncertainty Principle:
  • you can’t observe it without altering it

Playing with References

In [7]: a = []

In [8]: sys.getrefcount(a)
Out[8]: 2

In [9]: b = a

In [10]: sys.getrefcount(a)
Out[10]: 3

In [11]: l = [1,2,3,a]

In [12]: sys.getrefcount(a)
Out[12]: 4
In [13]: del b

In [14]: sys.getrefcount(a)
Out[14]: 3


In [15]: del l

In [16]: sys.getrefcount(a)
Out[16]: 2
# function local variables

    In [17]: def test(x):
       ....:     print "x has a refcount of:", sys.getrefcount(x)
       ....:

    In [18]: sys.getrefcount(a)
    Out[18]: 2

    In [19]: test(a)
    x has a refcount of: 4

    In [20]: sys.getrefcount(a)
    Out[20]: 2

Example tricky_refcount.py

In [21]: x = 3

In [22]: sys.getrefcount(x)
Out[22]: 428

WHOA!!

(hint: interning....)

The Power of Reference Counting

  • You don’t need to think about it most of the time.
  • Code that creates objects doesn’t need to delete them
  • Objects get deleted right away
  • Performance is predictable

The Limits of Reference Counting

  • Performance overhead on all operations. But the big one:

Circular references

If a python object somehow references itself – i.e. it references another object that references the first object:

You have a circular reference!

In the Python docs it’s called a reference cycle

Circular References

In [8]: l1 = [1,] ; l2 = [2,]

In [9]: l1.append(l2); l2.append(l1)

In [10]: l1
Out[10]: [1, [2, [...]]]

In [11]: l2
Out[11]: [2, [1, [...]]]

In [12]: l1[1]
Out[12]: [2, [1, [...]]]

In [13]: l2[1][1][1]
Out[13]: [1, [2, [...]]]

(demo) – simple_circular.py

The Garbage Collector

As of Python 2.0 – a garbage collector was added.

It can find and clean up “unreachable” references.

It is turned on by default:

In [1]: import gc
In [2]: gc.isenabled()
Out[2]: True

or you can force it:

In [4]: gc.collect()
Out[4]: 64

But it can be slow, and doesn’t always work!

How does the garbage collector work?

  • Not a full “mark and sweep” type.

It searches for reference cycles – then cleans those up.

  • It doesn’t have to bother checking non-container types (ints, strings, etc.)

Details here:

http://arctrix.com/nas/python/gc/ (or in the source!)

Big issue: classes that define a __del__ method are not cleaned up.

  • __del__ methods often act on references that may no be there if they are cleaned up in the wrong order.

Examples ( ref counting vs garbage collection )

Run the example in Examples/week-02-ref_counting/simple_circular_classes.py

In [1]: from simple_circular_classes import *

In [2]: x = PyObjWithDel()

In [3]: x = None
deleting PyObjWithDel object at 140459942915664

In [4]: x = PyObjWithDel()

In [5]: del x
deleting PyObjWithDel object at 140459942915600

Tools

If these objects are no longer “reachable” – how do you find out what’s going on?

We saw sys.getrefcount() – but you need a reference to the object to use it.

You can see what the refcount is before you delete the last reference, but that isn’t always easy.

Process Memory Use

A really coarse way to find a memory leak is to see if the process memory is growing.

It can be subtle –python (and the OS) do tricks to re-use memory, etc.

But if you have a “real” leak – you’ll see it. (Example to follow)

mem_check.py

provides functions that report the memory use of the current running process.

(*nix and Windows code)

Weak References

For times when you want to keep objects alive, Python provides “weak references”

(https://docs.python.org/2/library/weakref.html)

  1. The built-in containers:
  • WeakKeyDictionary
  • WeakValueDictionary
  • WeakSet
  1. Proxy objects
  • act much like regular references – client code doesn’t know the difference
  1. WeakRef objects
  • When you want to control what happens when the referenced object is gone.

Example

Run memcount.py by toggling the proxy line

Exercise

Build a “weak cache”:

For large objects that are expensive to create:

  • Use a WeakValueDictionay to hold references to (probably large) objects.
  • When the client requests an object that doesn’t exist – one is created, returned, and cached (weakly).
  • If the object is in the cache, it is returned.
  • when no other references exist to the object, it is NOT retained by the cache.