Extending Python with Complied Code

  • Chris Barker

Topics

  • Motivation
  • The C API
  • ctypes
  • Cython
  • Auto-generating wrappers
  • Others to consider

Motivation

Motivations for exiting pure Python

  • Performance
  • Integration with existing C libraries
  • Working as a glue language
  • Implement new builtin types

What is an extension module?

  • written in C (C API)
  • compiled code
  • lets you work directly with the CPython engine

Further reading:

http://docs.python.org/2/extending/extending.html

Example Case

To focus on the integration techniques, rather than complex C code, we’ll work with the following function we want to integrate:

#include <stdio.h>

int add(int x, int y) {
    return x+y;
}
int main(void) {
    int w = 3;
    int q = 2;
    printf("%d + %d = %d\n\n", w, q, add(w,q));
}

This is, of course, trivial and built in to Python, but the techniques are the same.

(Examples/week-08-extensions/pure-c/add.c)

Build it with the Makefile (Linux and OS-X):

all: add; gcc -o add add.c
$ make
gcc -o add add.c

and run it:

$ ./add
3 + 2 = 5

So are simple function works – but how to call it from Python?

The C API

Write your function in pure C using the Python API and import it into Python

Good for integrating with C library functions and system calls

The API isn’t trivial to learn

Lots of opportunity for error – you must do manual reference counting:

(http://docs.python.org/2/c-api/refcounting.html)

Further reading:

Intro to the C API

You’ll need the Python dev package installed on your system

Pull in the Python API to your C code via:

#include <Python.h>
/*
Note: Since Python may define some pre-processor definitions which
affect the standard headers on some systems, you must include
Python.h before any standard headers are included.

stdio.h, string.h, errno.h, and stdlib.h are included for you.
*/

Passing Data in and out of your function

Function arguments must be parsed on the way in and the way out

On the way in, we can call PyArg_ParseTuple:

if (!PyArg_ParseTuple(args, "s", &var1, ...))
    return NULL;

http://docs.python.org/2/c-api/arg.html#PyArg_ParseTuple


On the way out, we can call Py_BuildValue:

PyObject* Py_BuildValue(const char *format, ...)

http://docs.python.org/2/c-api/arg.html#Py_BuildValue

Registering your functions

First, register the name and address of your function in the method table:

// Module's method table and initialization function
static PyMethodDef AddMethods[] = {
    {"add", add, METH_VARARGS, "add two numbers"},
    {NULL, NULL, 0, NULL} // sentinel
};

https://docs.python.org/2/extending/extending.html#the-module-s-method-table-and-initialization-function

Initializing the module

Define an initialization function:

PyMODINIT_FUNC // does the right thing on Windows, Linux, etc.
initadd(void) {
    // Module's initialization function
    // Will be called again if you use Python's reload()
    (void) Py_InitModule("add", AddMethods);
}

It must be called initthe_module_name

https://docs.python.org/2/extending/extending.html#the-module-s-method-table-and-initialization-function

The whole thing:

#include <Python.h>

static PyObject *
add(PyObject *self, PyObject *args)
{
    int x, y, sts;

    if (!PyArg_ParseTuple(args, "ii", &x, &y))
        return NULL;
    sts = x+y;
    return Py_BuildValue("i", sts);
}
static PyMethodDef AddMethods[] = {
    {"add", add, METH_VARARGS, "add two numbers"},
    {NULL, NULL, 0, NULL} // sentinel
};
PyMODINIT_FUNC initadd(void) {
    (void) Py_InitModule("add", AddMethods);
}

Building your extension

setuptools provides features for automatically building extensions:

from setuptools import setup, Extension
setup(
    name='Cadd',
    version='1.0',
    description='simple c extension for an example',
    ext_modules=[Extension('add', sources=['add.c'])],
)

(distutils does too – but setuptools is getting updated to better support new stuff)

Run the setup.py:

python setup.py build_ext --inplace

(you can also just do install or develop if you want it properly installed)

Run the tests

test_add.py:

import pytest

import add

def test_basic():
    assert add.add(3,4) == 7

def test_negative():
    assert add.add(-12, 5) == -7

def test_float():
    with pytest.raises(TypeError):
        add.add(3, 4.0)

$ py.test

Subtleties we avoided:

There are a LOT of things you need to get right with a hand-written C Extension.

Exception handling

Works somewhat like the Unix errno variable:

  • Global indicator (per thread) of the last error that occurred.
  • Most functions don’t clear this on success, but will set it to indicate the cause of the error on failure.
  • Most functions also return an error indicator:
    • NULL if they are supposed to return a pointer,
    • -1 if they return an integer
    • The PyArg_*() functions return 1 for success and 0 for failure (and they set the Exception for you)

The easy way to set this indicator is with PyErr_SetString

http://docs.python.org/2/c-api/exceptions.html

(you can completely control the Exception handling if you need to)

ReferenceCounting

Whenever you create or no longer need a Py_Object, you need to increment or decrement the reference count:

Py_INCREF(x) and Py_DECREF(x)

PyArg_ParseTuple and Py_BuildValue

Handle this for you.

But if you’re creating new objects inside your function, you need to keep track.

And what it the function raises an exception in the middle and can’t finish?

This gets really ugly and error-prone (and hard to debug!)

LAB

LAB 1:

  • Add another function to the add.c file that multiplies two numbers instead.
  • Write some test code and make sure it works.

LAB 2:

  • Find the divide module in the examples/c-api directory
  • What happens when you call divide.divide(1/0)?
  • This is a different result than a pure Python 1/0, which throws an exception

Advanced:

  • Change the divide method to throw an appropriate exception in the divide-by-zero case

ctypes

Isn’t there an easier way to just call some C code?

What is ctypes?

A foreign function interface in Python

Binds functions in shared libraries to Python functions

Benefits:
  • Ships with Python, since 2.5
  • No new language to learn, it’s all Python
Drawbacks:
  • Performance hit for on the fly type translation
  • “thicker” interface in python

Example:

from ctypes import *
add = cdll.LoadLibrary("add.so")
print add.add(3,4)

Further reading:

http://docs.python.org/2/library/ctypes.html

Calling functions with ctypes

The shared lib must be loaded:

add = ctypes.cdll.LoadLibrary("add.so")

An already loaded lib can be found with:

libc = ctypes.CDLL("/usr/lib/libc.dylib")

ctypes comes with a utility to help find libs:

ctypes.util.find_library(name)

(good for system libs)

Once loaded, a ctypes wrapper around a c function can be called directly:

print add.add(3,4)

But....

C is statically typed – once compiled, the function must be called with the correct types.

ctypes Data Types

ctypes will auto-translate these native types:

  • None
  • int
  • byte strings (bytes(), str())
  • unicode (careful! unicode is ugly in C!)

These can be directly used as parameters when calling C functions.

Most types must be wrapped in a ctypes data type:

printf("An int %d, a double %f\n", 1234, c_double(3.14))

There are ctypes wrappers for all the “standard” C types

http://docs.python.org/2/library/ctypes.html#fundamental-data-types

You can also do pointers to types:

a_lib.a_function( ctypes.byref(c_float(x)))

http://docs.python.org/2/library/ctypes.html#passing-pointers-or-passing-parameters-by-reference

You can define C structs:

>>> class POINT(ctypes.Structure):
...     _fields_ = [("x", ctypes.c_int),
...                 ("y", ctypes.c_int)]
...
>>> point = POINT(10, 20)
>>> print point.x, point.y
10 20
>>> point = POINT(y=5)
>>> print point.x, point.y
0 5

You can define how to pass data from your custom classes to ctypes:

Define an _as_parameter_ attribute (or property):

class MyObject(object):
    def __init__(self, number):
        self._as_parameter_ = number

obj = MyObject(32)
printf("object value: %d\n", obj)

https://docs.python.org/2/library/ctypes.html#calling-functions-with-your-own-custom-data-types

(careful with types here!)

To define the return type, define the restype attribute.

Pre-defining the entire function signature:

libm.pow.restype = ctypes.c_double
libm.pow.argtypes = [ctypes.c_double, ctypes.c_double]

And you can just call it like a regular python function – ctypes will type check/convert at run time:

In [10]: libm.pow('a string', 4)
---------------------------------------------------------------------------
ArgumentError                             Traceback (most recent call last)
<ipython-input-10-01be690a307b> in <module>()
----> 1 libm.pow('a string', 4)

ArgumentError: argument 1: <type 'exceptions.TypeError'>: wrong type

Some more features

Defining callbacks into Python code from C:

ctypes.CFUNCTYPE(restype, *argtypes, use_errno=False, use_last_error=False)

http://docs.python.org/2/library/ctypes.html#ctypes.CFUNCTYPE


Numpy provides utilities for numpy arrays:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ctypes.html

(works well for C code that takes “classic” C arrays)

Summary:

ctypes allows you to call shared libraries:
  • Your own custom libs
  • System libs
  • Proprietary libs
Supports almost all of C:
  • Custom data types
    • structs
    • unions
    • pointers
  • callbacks
  • Upside:
    • You can call system libs with little code
    • You don’t need to compile anything
      • at least for system and pre-compiled libs
  • Downsides:
    • You need to specify the interface
      • and it is NOT checked for you!
    • Translation is done on the fly at run time
      • performance considerations

LAB

In Examples/week-08-extensions/ctypes you’ll find add.c

You can build a shared lib with it with make (make.bat) on Windows.

test_ctypes.py will call that dll, and a few system dlls.

  • Take a look at what’s there, and how it works.
  • add another function to add.c, that takes different types (maybe divide?)
  • rebuild, and figure out how to call it with ctypes.
  • Try calling other system functions with ctypes.

Cython

A Python like language with static types which compiles down to C code for Python extensions.

Cython

  • Can write pure python
    • Fully understands the python types
  • With careful typing – you get pure C (and pure C speed)
  • Can also call other C code: libraries or compiled in.
  • Used for custom Python extensions and/or call C and C++ code.

Further reading:

Web site:

http://www.cython.org/

Documentation:

http://docs.cython.org/

Wiki:

https://github.com/cython/cython/wiki

Developing with Cython

First, install cython with:

pip install cython

Cython files end in the .pyx extension. An example add.pyx:

def add(x, y):
    cdef int result=0
    result = x + y
    return result

(looks a lot like Python, eh?)

To build a cython module: write a setup.py that defines the extension:

from setuptools import setup
from Cython.Build import cythonize

setup(name = "cython_example",
      ext_modules = cythonize(['cy_add1.pyx',])
   )

cythonize is a utility that sets up extension module builds for you in a cython-aware way.

Building a module

For testing, it’s helpful to do:

python setup.py build_ext --inplace

which builds the extensions, and puts the resulting modules right in with the code.

If you have your setup.py set up for a proper package, you can do:

python setup.py develop
 or
python setup.py install

Just like for pure-python packages.

You can also do only the Cython step by hand at the command line:

cython a_file.pyx

Produces: a_file.c file that you can examine, or compile.

For easier reading, you can generate an annotated html version:

cython -a a_file.pyx

Generates``a_file.html`` html file that is easier to read and gives additional information that is helpful for debugging and performance tuning.

More on this later.

Basic Cython

Cython functions can be declared three ways:

def foo # callable from Python

cdef foo # only callable from Cython/C

cpdef foo # callable from both Cython and Python

Inside those functions, you can write virtually any python code.

But the real magic is with the optional type declarations: the cdef lines. Well see this as we go...

Calling a C function from Cython

You need to tell Cython about extenal functions you want to call with cdef extern.

The Cython code:

# distutils: sources = add.c
# This tells cythonize that you need that c file.

# telling cython what the function we want to call looks like.
cdef extern from "add.h":
    # pull in C add function, renaming to c_add for Cython
    int c_add "add" (int x, int y)

def add(x, y):
    # now that cython knows about it -- we can just call it.
    return c_add(x, y)

and the setup.py:

from setuptools import setup
from Cython.Build import cythonize

setup(name = "cython_example",
      ext_modules = cythonize(['cy_add_c.pyx']  )
      )

To build it:

$ python setup.py build_ext --inplace

and test it:

Chris$ python test_cy_add_c.py

if you didn't get an assertion, it worked

A pure Cython solution

Here it is as python code:

def add(x, y):
    result = x + y
    return result

Which we can put in a pyx file and compile with the setup.py:

#!/usr/bin/env python

from setuptools import setup
from Cython.Build import cythonize

setup(name = "cython_example",
      ext_modules = cythonize(['cy_add1.pyx',
                               ])
      )

and build:

python setup.py build_ext --inplace

and test:

Chris$ python test_cy_add1.py

if you didn't get an assertion, it worked

But this is still essentially Python. So let’s type define it:

def add(int x, int y):

    cdef int result=0
    result = x + y

    return result

now Cython knows that x, y, and result are ints, and can use raw C for that.

Build and test again:

Chris$ python setup.py build_ext --inplace

Chris$ python test_cy_add2.py

If you didn’t get an assertion, it worked

A real Example: the Cython process

Consider a more expensive function:

def f(x):
    return x**2-x

def integrate_f(a, b, N):
    s = 0
    dx = (b-a)/N
    for i in range(N):
        s += f(a+i*dx)
    return s * dx

This is a good candidate for Cython – an essentially static function called a lot.

Cython from pure Python to C

Let’s go through the steps one by one. In the Examples/week-08-extensions/cython/integrate directory:

cy_integrate1.pyx
cy_integrate2.pyx
cy_integrate3.pyx
cy_integrate4.pyx
cy_integrate5.pyx
cy_integrate6.pyx
cy_integrate7.pyx

At each step, we’ll time and look at the output from:

$cython -a cy_integrate1.pyx

AGC Example

Another useful example of doing something useful, and using a numpy array is in:

Examples/week-08-extensions/AGC_example

This one impliments an Automatic Gain Control Signal processing filter.

It turns out that you can use some advanced numpy tricks to get pretty good performancew with this filter, but you can’t get full-on speed without some compiled code.

This example uses all of:
  • Pure Cython
  • C called from Cython
  • f2py and Fortran

Auto-generated wrappers

There are few ways to auto-generate wrapper for C/C++ code:

SWIG

SIP

XDress

[also Boost-Python – not really a wrapper generator]

f2py – for Fortran

SWIG

Simple Wrapper Interface Generator

A language agnostic tool for integrating C/C++ code with high level languages

Advantages:

  • Code generation for other environments than Python.
  • Doesn’t require modification to your C source.

Disadvantages:

  • For anything non-trivial, requires substantial effort to develop the interface.
  • Awkward when you want to mix python and C in the interface.
  • Inefficient passing of “Swigified Pointers”

Language interfaces:

  • Python
  • Tcl
  • Perl
  • Guile (Scheme/Lisp)
  • Java
  • Ruby

And a bunch of others:

http://www.swig.org/compat.html#SupportedLanguages

Further reading:

http://www.swig.org/Doc1.3/Python.html

SWIGifying add()

SWIG doesn’t require modification to your C source code

The language interface is defined by an “interface file”, usually with a suffix of .i

From there, SWIG can generate interfaces for the languages it supports

The interface file contains ANSI C prototypes and variable declarations

The %module directive defines the name of the module that will be created by SWIG

Creating a wrapper:

(Examples/week-08-extensions/swig)

Create add.i:

%module add
%{
%}
extern int add(int x, int y);

Create setup.py:

from setuptools import setup, Extension

setup(
    name='add',
    py_modules=['add'],
    ext_modules=[
        Extension('_add', sources=['add.c', 'add.i'])
    ]
)

And build it:

python setup.py build_ext --inplace

NOTE: distutils (and thus setuptools) “knows” about SWIG, so it does the swig step for you when you give it a *.i file.

Notice what gets created:

  • an add_wrap.c file – the wrapper code.
  • an add.py file – python code that calls the C function
  • an _add.so (or _add.pyd) file – the compiled extension

You can then run the code:

python -c 'import add; print add.add(4,5)'

http://www.swig.org/Doc2.0/SWIGDocumentation.html#Introduction_nn5

Installing SWIG

On the SWIG download page, there is a source tarball for *nix, and Windows binaries:

http://www.swig.org/download.html

For Linux:

You may have it in your package repository:

apt-get install swig

If not, download the tarball, unpack it, and:

./configure
make
sudo make install

should do it.

For OS-X: the same thing, except you also need the “pcre” package. Which you can get from:

http://sourceforge.net/projects/pcre/files/pcre/8.37/pcre-8.37.tar.gz/download

Put it in the dir created when the SWIG source was unpacked.

Unpack it, then run this to set it up for use with SWIG:

Tools/pcre-build.sh

Then you can do the standard:

./configure
make
make install

Decisions, Decisions...

So what to use???

My decision tree

Are you calling a few system library calls?

  • Use ctypes

Do you have a really big library to wrap? – use a wrapper generator:

  • SWIG (other languages?)
  • SIP
  • XDress

Are you writing extensions from scratch?

  • Cython
  • Do you love C++ ?
    • Boost Python

Do you want a “thick” wrapper around a C/C++ lib:

  • Cython