Introduction
This is the first part of a two-part series on Python. This article will cover the basic techniques for extending Python using C/C++. The second article will cover the techniques for embedding the Python interpreter in your C/C++ application. While I don't often use Python extensions directly, the information covered here is an absolute requirement for embedded use.
This article is not meant to be an introduction to Python. It assumes a working knowledge of the language.
You will need a Python distribution to build and use the sample. There are two major distributors of Python: Python.org and ActiveState.com. Either one will suffice, but my preference these days is the ActiveState one. They've compiled the help files into a Windows HTML help file that I find easier to navigate, than the basic distribution. Plus it comes with all of the Windows extension libraries.
They both come with include and libs directories, so you will not need to download the source code. The source will be necessary, however, in the second part of this series.
Extensions - What are they?
Python's motto is "Batteries included". By this they mean that, out of the package, Python does a lot. It comes with many extra modules that give the user, access to such features as sockets, CGI, URL parsing and HTTP support, XML processing, MIME, threads, and even XML-based RPC. Despite all of these extras, we will always need/want to customize.
Extension modules come in two forms: native Python, and C/C++ dynamically linked libraries (DLLs). Native Python modules are simply Python scripts that are available to be imported by user scripts. Creating native Python modules is as simple as writing a Python script.
C/C++ extension modules are compiled DLLs with a standard exported function that handles module initialization and registration. This article covers these DLL-based extension modules.
The API
Python is written in C, and the authors have been kind enough to expose and document most of the interpreter internals. It is through this API that we gain the access we need, to extend the language.
Python objects
Every object, nay every value, in Python is represented internally as a PyObject
. PyObject
is a structure that defines all of the handler entry points, and maintains type information and reference counts. One of the fundamentals of Python extension programming is that, whenever you manipulate any Python object in C/C++, you will be manipulating a PyObject
.
That being said, you will rarely use the PyObject
API routines. Instead, you will use the API routines that apply to the specific Python type being manipulated. Please see the Python C/C++ API documentation for specifics.
Reference counting
Python handles basic memory management with a reference counting mechanism. Each object has a reference count that gets incremented when the object is copied, and decremented when the object reference is dropped.
class doodle:
# etc.
d = doodle() # reference count = 1
e = d # reference count = 2
del(d) # reference count = 1
del(e) # reference count = 0 object deleted
When manipulating Python objects in C/C++, one must be conscious of the reference counts. Certain functions (marked in the Python C API documents) return new objects; others return borrowed references. Use the Py_INCREF()
and Py_DECREF()
macros to assist you. Also, I suggest that you mark each questionable API call with a comment noting the return type.
PyObject *pList = PyList_New(5);
...
PyObject *pItem = PyList_GetItem(2);
...
Py_DECREF(pList);
Reference counts are important! I have spent many hours trying to track down memory leaks only to find that I didn't Py_DECREF()
an object when I should have.
Python types
There are six major native data types in Python: integers, floats, strings, tuples, lists and dictionaries. Python supports a variety of other native types (complex, long integers, etc), the use of which will be left as the proverbial exercise for the reader.
Integers, floats and strings
These are just what you would expect. The only thing you need to know is how to build and manipulate them.
PyObject *pInt = Py_BuildValue("i", 147);
assert(PyInt_Check(pInt));
int i = PyInt_AsLong(pInt);
Py_DECREF(pInt);
PyObject *pFloat =
Py_BuildValue("f", 3.14159f);
assert(PyFloat_Check(pFloat));
float f = PyFloat_AsDouble(pFloat);
Py_DECREF(pFloat);
PyObject *pString =
Py_BuildValue("s", "yabbadabbadoo");
assert(PyString_Check(pString);
int nLen = PyString_Size(pString);
char *s = PyString_AsString(pString);
Py_DECREF(pString);
Tuples
Tuples are fixed-length immutable arrays. When a Python script calls a C/C++ extension method, all non-keyword arguments are passed in a tuple. Needless to say, parsing this tuple tends to be the first thing done in your methods.
Here is a mish-mash of tuple use:
PyObject *pTuple = PyTuple_New(3);
assert(PyTuple_Check(pTuple));
assert(PyTuple_Size(pTuple) == 3);
PyTuple_SetItem(pTuple, 0, Py_BuildValue("i", 1));
PyTuple_SetItem(pTuple, 1, Py_BuildValue("f", 2.0f));
PyTuple_SetItem(pTuple, 2, Py_BuildValue("s", "three"));
int i;
float f;
char *s;
if(!PyArg_ParseTuple(pTuple, "ifs", &i, &f, &s))
PyErr_SetString(PyExc_TypeError, "invalid parameter");
Py_DECREF(pTuple);
PyArg_ParseTuple()
is probably one of the most commonly used API functions. The second parameter is a string that dictates the types of objects expected in the tuple. ifs
means: integer, float, string. Please see the API documentation for a detailed explanation, and a list of the other type characters.
Lists
Lists are like STL vectors. They allow random access and iteration over stored objects. Here is an example of typical list use:
PyObject *pList = PyList_New(5);
assert(PyList_Check(pList));
for(int i = 0; i < 5; ++i)
PyList_SetItem(pList, i, Py_BuildValue("i", i));
PyList_Insert(pList, 3, Py_BuildValue("s", "inserted"));
PyList_Append(pList, Py_BuildValue("s", "appended"));
PyList_Sort(pList);
PyList_Reverse(pList);
PyObject *pSlice =
PyList_GetSlice(pList, 2, 4);
for(int j = 0; j < PyList_Size(pSlice); ++j) {
PyObject *pValue = PyList_GetItem(pList, j);
assert(pValue);
}
Py_DECREF(pSlice);
Py_DECREF(pList);
Dictionaries
Dictionaries are the equivalent of STL maps. They map keys to values. Here is an example of typical dictionary use:
PyObject *pDict = PyDict_New();
assert(PyDict_Check(pDict));
PyDict_SetItemString(pDict, "first",
Py_BuildValue("i", 1));
PyDict_SetItemString(pDict, "second",
Py_BuildValue("f", 2.0f));
PyObject *pKeys = PyDict_Keys();
for(int i = 0; i < PyList_Size(pKeys); ++i) {
PyObject *pKey =
PyList_GetItem(pKeys, i);
PyObject *pValue =
PyDict_GetItem(pDict, pKey);
assert(pValue);
}
Py_DECREF(pKeys);
PyDict_DelItemString(pDict, "second");
Py_DECREF(pDict);
Extension concepts
An extension module typically consists of three parts: the actual exported functions, the method table and the initialization function.
First, we will look at a simple example of an extension module. Then we will examine each part individually.
A typical extension module will look like this:
static PyObject *wanklib_yinkle(PyObject *pSelf,
PyObject *pArgs)
{
char *szString;
int nInt;
float fFloat;
PyObject *pList;
if(!PyArg_ParseTuple(pArgs, "sifo", &szString, &nInt,
&fFloat, &pList))
{
PyErr_SetString(PyExc_TypeError,
"yinkle() invalid parameter");
return NULL;
}
if(!PyList_Check(pList)) {
PyErr_SetString(PyExc_TypeError,
"yinkle() fourth parameter must be a list");
return NULL;
}
PyList_Append(pList,
Py_BuildObject("f",
strlen(szString) * nInt / fFloat));
Py_INCREF(Py_None);
return Py_None;
}
static PyMethodDef WankLibMethods[] = {
{"yinkle", wanklib_yinkle,
METH_VARARGS, "Do a bit of stuff."},
{NULL, NULL, 0, NULL}
};
void initwanklib(void) {
PyObject *pModule =
Py_InitModule("wanklib", WankLibMethods);
PyObject *pDict = PyModule_GetDict(pModule);
PyDict_SetItemString(pDict,
"eleven", Py_BuildValue("i", 147));
PyDict_SetItemString(pDict,
"doubleyew", Py_BuildValue("s", "kay"));
}
Initialization function
Every extension module must export a function called initmodule
. When a Python script requests that the module be imported, Python queries the library for that exact named function and calls it. The initialization function is responsible for telling Python about the functions, variables and classes that it exports.
Method table
The initialization function will call the Python routine Py_InitModule()
to register the module methods. It will pass the name by which the new module will be known, and a table describing the exported methods. Each table entry consists of four parts: the callable name string, the function itself, a parameter describing how parameters will be passed, and a documentation string. The last entry in the table needs to be a sentinel with NULL
entries.
The name string is the name by which the method will be callable from Python. The parameter type marker can be METH_VARARGS
or METH_KEYWORDS
. METH_VARARGS
is the standard way of passing parameters; they arrive packaged in a tuple. Specifying METH_KEYWORDS
requests that, named parameters be passed in a dictionary.
Methods
All extension methods have the same prototype (given that they are marked as METH_VARARGS
):
PyObject *method(PyObject *pSelf, PyObject *pArgs);
All extension methods must return a PyObject
pointer. If the function has no real return value, you must return a pointer to the global "None" object, after incrementing its reference:
PyObject *method(PyObject *pSelf, PyObject *pArgs) {
Py_INCREF(Py_None);
return Py_None;
}
To signify that an error has occurred and to throw a Python exception, you must return NULL
and set the error string:
PyObject *method(PyObject *pSelf, PyObject *pArgs) {
PyErr_SetString(PyExc_StandardError,
"something bad happened");
return NULL;
}
The first argument to extension methods is a "self" pointer and is really only valid when you are building custom classes. These will be detailed in the next article.
The second argument is a tuple containing each parameter in order. As mentioned above, parsing this tuple is usually the first thing that happens.
Variables
Each Python module has a dictionary of local objects. In order to export variables from your module, all you need to do is add them to this dictionary. Py_InitModule()
returns a pointer to the initialized module. PyModule_GetDict()
retrieves the local object dictionary.
PyObject *pModule =
Py_InitModule("wanklib", wankLibMethods);
PyObject *pDict = PyModule_GetDict(pModule);
PyDict_SetItemString(pDict, "someVar",
Py_BuildValue("i", 147));
Implementation
In Windows, Python extensions are simply DLL files with a known exported symbol (the initialization function). In order to build an extension, you must create a Win32 dynamic linked library project in Visual Studio. Choose A DLL that exports some symbols, so you have a bit of a template from which to work. I'm sure you could build an extension using the MFC AppWizard but I've never tried it and don't intend to.
Simple extensions can be built in a single file, and will follow the layout shown above in the example.
All Python API declarations are accessed by including one file: Python.h. It is located in the include subdirectory below your Python installation. Rather than hard coding the path, it's much more desirable to add the directory to your Tools/Options/Directories list. While you're at it, add the libs subdirectory to the list of Library Files search paths as well.
There is no need to explicitly link the Python library. The Python.h include file uses a pragma to force the proper linkage.
NOTE: The pragma in Python.h forces the linkage of the debug build of Python: Python22_d.lib (version may be different depending on the version you have installed). If you haven't downloaded the Python source code, you likely don't have this library. Your choices are to download and build the debug versions, or to build your extensions in release mode.
In order to remove the C++ name mangling, you need to define your initialization function as extern "C"
.
And lastly, once compiled, place your DLL file in the DLLs subdirectory off of your Python installation. It will get picked up automatically when you try to import it.
Beyond that, you should be ready to go.
Example
The example that I have included simply wraps the Mersenne Twister pseudo random number generator. I have used what appears to be original code by the "inventors", Makoto Matsumoto and Takuji Nishimura. The mtprng
module provides two methods: sgenrand()
to seed the generator, and genrand()
to generate a number on [0,1]. The code compiles with VS6 and SP5, with Python 2.2 installed, although any Python version beyond 1.5.3 should be fine.
Good luck!
History
- November 21 2002 - Created.