The Python API is a C/C++ programming toolset that enables developers to write code that integrates Python and C++. The first part of the article shows how the Python API makes it possible to access Python code in a C/C++ application. The second part explains how to code extension modules in C++ that can be imported into regular Python code.
1. Introduction
According to StackOverflow, two of the most popular programming languages for desktop development are C++ and Python. Many applications are coded in C++ for performance, and provide a Python interface that enables configuration and scripting. Unfortunately, the two languages are so different that many developers don't know how to access Python code in C++ or call C++ functions from Python.
Thankfully, some implementations of Python (such as the reference implementation at python.org) provide C/C++ headers and libraries that simplify the process of interfacing C++ and Python. These headers and libraries form the Python API, and the goal of this article is to explain how to use Python.
To be specific, this article focuses on two ways of using the Python API. The first part of the article explains how to access Python inside C++ code. This is called embedding Python. The second part explains how to code a C++ library that can be accessed as a Python module. This is called an extension module.
But before I discuss either topic, I need to explain how to download Python and install it on your system. If you're already up to speed on Python, feel free to skip the next section.
2. Installing Python
A Python implementation is a toolset that provides a Python interpreter, basic Python modules, and other utilities such as pip. Based on my experience, there are four main implementations of Python:
- CPython - oldest and most popular, written in C
- PyPy - similar to CPython, but uses just-in-time compilation to improve performance
- Jython - written in Java, converts Python into bytecode
- IronPython - written in C#, enables Python to access C# and .NET features
CPython is the only Python implementation that provides a C/C++ interface, so this article focuses on CPython. If you're running Linux, you can install it by using your package manager (apt-get install python3.11 python3-dev
on Ubuntu, yum install python3 python3-devel
on RHEL and CentOS).
If you're running Windows or macOS, you can download an installer from the Python download site. Select your operating system and click the link for the latest version of Python, and your browser will download the executable. When you run the executable, a dialog will ask for configuration settings. The following image shows what this looks like on Windows for version 3.11.4.
At the bottom of the dialog, check the box for Add python.exe to PATH. This ensures that you'll be able to launch the Python interpreter from the command line.
If you click Install Now on Windows, Python will be installed in the AppData\Local\Programs folder. You can customize this by clicking the Customize installation link and selecting a different folder. When the installation is finished, click the dialog's Close button.
For this article, it isn't important where Python is installed, but it is important that you know the installation directory. If you look in the top-level include directory, you'll find the header files for the Python API. The libs directory contains the library required for linking C/C++ applications. On my Windows system, the required library file is named python311.lib. On my Linux system, its name is libpython3.11.so.
3. Embedding Python in C/C++
The Python API provides several header files that declare C/C++ functions capable of accessing Python modules and code. The technical term for accessing Python in external code is called embedding, and the central header file is Python.h.
The goal of this section is to look at the functions in Python.h that can make embedding possible. They can be frustrating to use because Python data structures are all represented by instances of the PyObject
data type. Modules are represented by PyObject
s, functions and methods are represented by PyObject
s, and variables are represented by PyObject
s.
This discussion won't discuss all of the functions in the Python API, or even most of them. Instead, we'll look at the functions into two categories:
- Fundamental functions - Functions that access modules, methods, and properties
- Object creation and conversion - Functions that create
PyObject
s and convert them to other types
After exploring these functions, this section presents C++ code that reads a function from a simple Python module, sets its parameters, and then executes the Python function.
3.1 Fundamental Functions
To embed Python processing in a C++ application, a developer should be familiar with a central set of functions. Table 1 lists them and provides a description of each.
Table 1: Fundamental Functions of the Python API
Function Signature | Description |
Py_Initialize() | Initializes the interpreter and modules |
Py_Finalize() | Deallocates the interpreter and resources |
PyImport_ImportModule(const char*) | Imports the given module |
PyObject_HasAttrString(
PyObject*, const char*) | Checks if the attribute is present |
PyObject_GetAttrString(
PyObject*, const char*) | Accesses the given attribute |
PyCallable_Check(PyObject*) | Checks if the attribute can be executed |
PyObject_Repr(PyObject*) | Creates a PyObject from the printed representation |
PyObject_Str(PyObject*) | Creates a PyObject from the string representation |
PyObject_CallObject(
PyObject*, PyObject*) | Executes the object with arguments |
PyINCREF(PyObject*) | Increments the reference (can't be null ) |
PyXINCREF(PyObject*) | Increments the reference (can be null ) |
PyDECREF(PyObject*) | Decrements the reference (can't be null ) |
PyXDECREF(PyObject*) | Decrements the reference (can be null ) |
The first function, Py_Initialize
, is particularly important because it performs the tasks needed to make Python available in C/C++. This must be called before the application can access Python modules and features.
After initializing the environment, an application can access Python modules by calling PyImport_ImportModule
. This accepts the name of the module and returns a PyObject
pointer that represents the module. If the module is contained in a Python file, the *.py suffix should be omitted.
For example, the following function call accesses the code in simple.py:
PyObject *mod = PyImport_ImportModule("simple");
Once an application has accessed a module or data structure, it can examine its attributes. The PyObject_HasAttrString
function identifies if an attribute is present. If the attribute is present, PyObject_GetAttrString
returns a PyObject
pointer representing the attribute.
For example, the following code accesses an attribute named plus
from simple.py.
PyObject *mod, *attr;
mod = PyImport_ImportModule("simple");
if (mod != nullptr) {
if (PyObject_HasAttrString(mod, "plus") == 1) {
attr = PyObject_GetAttrString(mod, "plus");
}
}
Properties and functions are both accessed as attributes, but functions can be invoked and properties can't. To check if an attribute can be invoked, an application needs to call PyCallable_Check
, which returns 1
if the attribute can be called and 0
if it can't.
If an attribute can be invoked, PyObject_CallObject
tells the interpreter to execute the attribute. The first argument is the PyObject
pointer representing the attribute and the second is a PyObject
pointer that represents a tuple containing the method's arguments.
Every PyObject
has a reference count that identifies how many times it's being accessed. When it's created, the count is set to 1
. Applications can increment the reference count by calling PyINCREF
or PyXINCREF
. Both accept a pointer to a PyObject
, and the first should only be called if the pointer isn't NULL
. The second function, PyXINCREF
, can be called if the pointer is null
.
When a PyObject
is no longer needed, the application should call PyDECREF
or PyXDECREF
to decrement the object's reference count. Once the count reaches 0
, the object will be deallocated. Both functions accept a pointer to a PyObject
, and PyDECREF
should only be called if the pointer isn't null.
3.2 Object Creation and Conversion
In many cases, an application will need to create a PyObject
from regular data or extract regular data from a PyObject
. The application may also need to create Python-specific structures like lists or tuples. Table 2 lists the functions that perform these tasks.
Table 2: Object Creation/Conversion Functions
Function Signature | Description |
PyLong_FromLong(long) | Create a PyObject from a long integer |
PyLong_AsLong(PyObject*) | Return the PyObject 's long integer |
PyFloat_FromDouble(double) | Create a PyObject from a double |
PyFloat_AsDouble(PyObject*) | Return the PyObject 's double |
PyUnicode_FromString(const char*) | Create a PyObject from a string |
PyUnicode_AsEncodedString(
PyObject*, const char*, const char*) | Create a PyObject from the encoded string |
PyBytes_FromString(const char*) | Create a PyObject from a string |
PyBytes_AsString(PyObject*) | Return the PyObject 's string |
PyTuple_New(Py_ssize_t) | Create a PyObject representing a tuple |
PyTuple_GetItem(PyObject*, Py_ssize_t) | Return the given element of a tuple |
PyTuple_SetItem(PyObject*, Py_ssize_t,
PyObject*) | Set the given element of a tuple |
PyList_New(Py_ssize_t) | Create a PyObject representing a list |
PyList_GetItem(PyObject*, Py_ssize_t) | Return the given element of a list |
PyList_SetItem(PyObject*, Py_ssize_t,
PyObject*) | Set the given element of a list |
These functions become important when an application needs to read or set an attribute's value. For example, if float_attr
is a Python attribute containing a floating-point value, PyFloat_AsDouble
will return a double
that can be processed in C/C++.
Dealing with text is complicated. The Python API makes it possible to create a unicode PyObject
from a string
by calling PyUnicode_FromString
. You can also create a bytes PyObject
from a string
by calling PyBytes_FromString
.
Displaying an attribute's string
is also complicated. PyObject_Str
returns a PyObject
containing an object's string
and PyUnicode_AsEncodedString
converts this to a bytes PyObject
containing the encoded representation of the string
. Then PyBytes_AsString
returns the C/C++ string
corresponding to the encoded string
.
For example, the following code obtains the string
representation of the my_attr
attribute, encodes it using UTF-8, and prints the corresponding C/C++ string
.
PyObject* attr = PyObject_GetAttrString(mod, "my_attr");
PyObject* str = PyObject_Str(attr);
PyObject* ucode = PyUnicode_AsEncodedString(str, "utf-8", NULL);
const char* bytes = PyBytes_AsString(ucode);
std::cout << bytes << std::endl;
To pass arguments to a Python method, an application needs to create a tuple and insert an element for each argument to be passed. In code, the tuple can be created by calling PyTuple_New
and elements can be set by calling PyTuple_SetItem
. Similarly, an application can create a Python list by calling PyList_New
and set its elements by calling PyList_SetItem
.
3.3 Simple Embedding Example
To demonstrate how embedding works, the source code for this article contains two source files:
- simple.py - a simple Python file that defines a function named
plus
, which returns the sum of its two arguments - embedding.cpp - a C++ application that uses the Python API to access the code in simple.py
The following code presents the content of embedding.cpp. This accesses simple.py, finds the plus
attribute, sets its arguments, and executes the function.
#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <iostream>
int main() {
PyObject *module, *func, *args, *ret;
Py_Initialize();
module = PyImport_ImportModule("simple");
if (module) {
func = PyObject_GetAttrString(module, "plus");
if (func && PyCallable_Check(func)) {
args = PyTuple_New(2);
PyTuple_SetItem(args, 0, PyLong_FromLong(4));
PyTuple_SetItem(args, 1, PyLong_FromLong(7));
ret = PyObject_CallObject(func, args);
Py_DECREF(args);
Py_DECREF(func);
Py_DECREF(module);
if (ret) {
long retVal = PyLong_AsLong(ret);
std::cout << "Result: " << retVal << std::endl;
}
else {
PyErr_Print();
std::cerr << "Couldn't access return value" << std::endl;
Py_Finalize();
return 1;
}
}
else {
if (PyErr_Occurred())
PyErr_Print();
std::cerr << "Couldn't execute function" << std::endl;
}
}
else {
PyErr_Print();
std::cerr << "Couldn't access module" << std::endl;
Py_Finalize();
return 1;
}
Py_Finalize();
return 0;
}
After accessing the module, the application invokes the module's plus
function by calling PyObject_CallObject
. It passes two arguments: the attribute representing the function and a tuple containing the two values to be passed to the plus
function. The following code shows what the plus
function in simple.py looks like.
def plus(a, b):
return a + b
To build an application from embedding.cpp, you'll need to tell the compiler about the header files in the include folder of the Python installation directory and the library file in the libs folder. When I run the application on my system, the output is given as follows:
Result: 11
4. Creating a Python Extension
A Python interpreter can access built-in modules that include os
, datetime
, and string
. Using the Python API, we can add new built-in modules called extension modules. Unlike regular modules, extension modules are dynamic libraries coded in C or C++.
This discussion walks through the development of an extension module named plustwo
. This contains a function named addtwo
, which accepts a number and returns the sum of the number and 2.
python
>>> import plustwo
>>> x = plustwo.addtwo(5)
>>> x
7
Programming extension modules is hard because the functions need to have special names and they have to provide special data structures. To understand the process, you need to be aware of three points:
- If the desired module is
modname
, the code must define a function named PyInit_modname
that doesn't accept any parameters. For the example, the module is named plustwo
, so the code defines a function named PyInit_plustwo
. - To describe the module, the code must create a
PyModuleDef
structure and set its fields. These fields identify the module's name, its documentation, and its methods. - For each function in the module, the code must create a
PyMethodDef
structure and set its fields. These fields identify the method's name, arguments, and documentation. It must also identify the function that will be called when the method is invoked.
This section discusses these points in detail and then shows how the plustwo
extension module can be implemented in code.
4.1 The PyInit Function
When a Python interpreter imports the extension module modname
for the first time, it will call the PyInit_modname
function. This must be coded properly to ensure that the interpreter can execute it, and there are five rules to follow:
- It must be preceded by the
PyMODINIT_FUNC
macro, and it must be the only function preceded by this macro. - It must not be
static
, and it must be the only non-static
item in the code. - It can't accept any parameters.
- It must call
PyModuleCreate
with a reference to the PyModuleDef
that describes the module. - Its return value must be set to the return value of
PyModuleCreate
.
The best way to understand these rules is to look at an example. If the module name is plustwo
and the PyModuleDef
structure that describes the module is moduleDef
, the following code shows how PyInit_plustwo
can be coded:
PyMODINIT_FUNC PyInit_plustwo() {
return PyModule_Create(&moduleDef);
}
At minimum, the function needs to call PyModule_Create
with a PyModuleDef
reference and return the result. But the function can be coded to do more than call PyModule_Create
.
4.2 The PyModuleDef Structure
The extension module needs to create a PyModuleDef
structure to tell the Python interpreter how the module should be processed. Table 3 lists each of the fields of this structure and their data types.
Table 3: Fields of the PyModuleDef Structure
Field Name | Data Type | Description |
m_base | PyModuleDef_Base | Always set to PyModuleDef_HEAD_INIT |
m_name | const char* | The module's name |
m_doc | const char* | The module's description |
m_size | Py_ssize_t | Size of memory to store module state |
m_methods | PyMethodDef* | Array of method descriptors |
m_slots | PyMethodDef_Slot* | Array of method slots |
m_traverse | traverseproc | Traversal function |
m_clear | inquiry | Inquiry function |
m_free | freefunc | Function that frees resources |
This article focuses on the first five fields, and the first should always be set to PyModuleDef_HEAD_INIT
. The second should be set to the module's name and the third should be set to the module's docstring
, which is displayed when the help
function is called.
The fourth field is important for modules that require multi-phase initialization and sub-interpreters. This field identifies how much memory should be set aside to store the module's state data. This isn't a concern for most extension modules, so m_size
should be set to -1
. This is shown in the following code:
static struct PyModuleDef moduleDef = {
PyModuleDef_HEAD_INIT,
"plustwo",
"This module contains a function (addtwo) that adds two to a number\n",
-1,
funcs };
The m_methods
field must be set to an array containing a PyMethodDef
structure for each function contained in the module. In the example, the plustwo
module has one function named addtwo
. Therefore, the funcs
array in the example code contains one PyMethodDef
structure.
4.3 The PyMethodDef Structure
An extension module identifies its functions by providing an array of PyMethodDef
structures. The fields of a PyMethodDef
identify the function's name, arguments, and docstring. Table 4 lists these fields and their data types.
Table 4: Fields of the PyMethodDef Structure
Field Name | Data Type | Description |
ml_name | const char* | The function's name |
ml_meth | PyCFunction | The C function that provides the code |
ml_flags | int | Flags that identify the function's arguments |
ml_doc | const char* | The function's docstring |
The second field, ml_meth
, must be set to a C function that provides the code to be executed when the module's function is called. When coding this function, there are four rules to keep in mind:
- If the module function has a different name than the module, the name of the C function should be set to
modname_funcname
, where modname
is the name of the module and funcname
is the name of the function. - If the module function has the same name as its surrounding module, the name of the C function should be set to
modname
. - The number of arguments accepted by the function is determined by the
ml_flags
argument of the PyMethodDef
. - The C function must be declared
static
and it must return a PyObject
pointer. If the function doesn't return a value, it should use Py_RETURN_NONE
to return an empty object.
The third field, ml_flags
, identifies the nature of the arguments accepted by the function. This is usually set to one of three values:
METH_NOARGS
- The function accepts a single PyObject*
argument that represents the module. METH_O
- The function accepts two arguments: a PyObject*
representing the module and a PyObject*
that represents the single argument. METH_VARARGS
- The function accepts two arguments: a PyObject*
representing the module and a PyObject*
that represents a tuple containing the function's parameters.
For this article's example, the addtwo
function accepts a numeric argument and returns the sum of the argument and 2. Because there's only one argument, ml_flags
should be set to METH_O
.
4.4 The plustwo Extension Module
At this point, you should have a basic grasp of the functions and data structures that must be created in an extension module. The following listing presents the code in plustwo.cpp, which is part of this article's source code. This file defines an extension module named plustwo
containing a function named addtwo
:
#define PY_SSIZE_T_CLEAN
#include <Python.h>
static PyObject* plustwo_addtwo(PyObject* self, PyObject* arg) {
long longArg = PyLong_AsLong(arg) + 2;
return PyLong_FromLong(longArg);
}
static PyMethodDef funcs[] = {
{"addtwo", (PyCFunction)plustwo_addtwo, METH_O, "This adds two to a number\n"},
{NULL, NULL, 0, NULL}
};
static struct PyModuleDef moduleDef = {
PyModuleDef_HEAD_INIT,
"plustwo",
"This module contains a function (addtwo) that adds two to a number\n",
-1,
funcs
};
PyMODINIT_FUNC PyInit_plustwo() {
return PyModule_Create(&moduleDef);
}
As you look at this code, there are a few items to notice:
- All the functions and structures are
static
except for PyInit_plustwo
at the end. - The function corresponding to the
addtwo
module function is called plustwo_addtwo
because the module function has a different name than the module. - The module only has one function, but the array of
PyMethodDef
s has two elements. The second element defines a null
function, and if this isn't present, the code won't work. - The third argument of the
PyMethodDef
is METH_O
, which specifies that the function only accepts one argument. But in code, plustwo_addtwo
accepts two arguments: a PyObject
representing the module and a PyObject
representing the input argument.
To serve as an extension module, this code must be compiled as a dynamic library (plustwo.dll on Windows, plustwo.so on Linux and macOS). On Windows, plustwo.dll must be renamed to plustwo.pyd, which identifies the file as a Python dynamic module. On Linux, the *.so suffix can be left unchanged.
Once the extension module is created, you can test it by opening a Python prompt. Then you can import the plustwo
module and call the addtwo
function with a session like the following:
python
>>> import plustwo
>>> x = plustwo.addtwo(5)
>>> x
7
5. History
- 2nd August, 2023: Initial submission
- 4th August, 2023: Fixed code labels
- 9th August, 2023: Added calls to
PyFinalize