Interfacing C++ and Python with the Python API

Matt Scarpino

5.00/5 (11 votes)

2 Aug 2023CPOL15 min read

15.7K

307

This article explains how the Python API makes it possible to embed Python in C++ and write extension modules in C++ that can be imported in Python.

The Python API is a C/C++ programming toolset that enables developers to write code that integrates Python and C++. The first part of the article shows how the Python API makes it possible to access Python code in a C/C++ application. The second part explains how to code extension modules in C++ that can be imported into regular Python code.

Download source code - 1.5 KB

1. Introduction

According to StackOverflow, two of the most popular programming languages for desktop development are C++ and Python. Many applications are coded in C++ for performance, and provide a Python interface that enables configuration and scripting. Unfortunately, the two languages are so different that many developers don't know how to access Python code in C++ or call C++ functions from Python.

Thankfully, some implementations of Python (such as the reference implementation at python.org) provide C/C++ headers and libraries that simplify the process of interfacing C++ and Python. These headers and libraries form the Python API, and the goal of this article is to explain how to use Python.

To be specific, this article focuses on two ways of using the Python API. The first part of the article explains how to access Python inside C++ code. This is called embedding Python. The second part explains how to code a C++ library that can be accessed as a Python module. This is called an extension module.

But before I discuss either topic, I need to explain how to download Python and install it on your system. If you're already up to speed on Python, feel free to skip the next section.

2. Installing Python

A Python implementation is a toolset that provides a Python interpreter, basic Python modules, and other utilities such as pip. Based on my experience, there are four main implementations of Python:

CPython - oldest and most popular, written in C
PyPy - similar to CPython, but uses just-in-time compilation to improve performance
Jython - written in Java, converts Python into bytecode
IronPython - written in C#, enables Python to access C# and .NET features

CPython is the only Python implementation that provides a C/C++ interface, so this article focuses on CPython. If you're running Linux, you can install it by using your package manager (apt-get install python3.11 python3-dev on Ubuntu, yum install python3 python3-devel on RHEL and CentOS).

If you're running Windows or macOS, you can download an installer from the Python download site. Select your operating system and click the link for the latest version of Python, and your browser will download the executable. When you run the executable, a dialog will ask for configuration settings. The following image shows what this looks like on Windows for version 3.11.4.

At the bottom of the dialog, check the box for Add python.exe to PATH. This ensures that you'll be able to launch the Python interpreter from the command line.

If you click Install Now on Windows, Python will be installed in the AppData\Local\Programs folder. You can customize this by clicking the Customize installation link and selecting a different folder. When the installation is finished, click the dialog's Close button.

For this article, it isn't important where Python is installed, but it is important that you know the installation directory. If you look in the top-level include directory, you'll find the header files for the Python API. The libs directory contains the library required for linking C/C++ applications. On my Windows system, the required library file is named python311.lib. On my Linux system, its name is libpython3.11.so.

3. Embedding Python in C/C++

The Python API provides several header files that declare C/C++ functions capable of accessing Python modules and code. The technical term for accessing Python in external code is called embedding, and the central header file is Python.h.

The goal of this section is to look at the functions in Python.h that can make embedding possible. They can be frustrating to use because Python data structures are all represented by instances of the PyObject data type. Modules are represented by PyObjects, functions and methods are represented by PyObjects, and variables are represented by PyObjects.

This discussion won't discuss all of the functions in the Python API, or even most of them. Instead, we'll look at the functions into two categories:

Fundamental functions - Functions that access modules, methods, and properties
Object creation and conversion - Functions that create PyObjects and convert them to other types

After exploring these functions, this section presents C++ code that reads a function from a simple Python module, sets its parameters, and then executes the Python function.

3.1 Fundamental Functions

To embed Python processing in a C++ application, a developer should be familiar with a central set of functions. Table 1 lists them and provides a description of each.

Table 1: Fundamental Functions of the Python API

Function Signature	Description
`Py_Initialize()`	Initializes the interpreter and modules
`Py_Finalize()`	Deallocates the interpreter and resources
`PyImport_ImportModule(const char*)`	Imports the given module
`PyObject_HasAttrString(` `PyObject, const char)`	Checks if the attribute is present
`PyObject_GetAttrString(` `PyObject, const char)`	Accesses the given attribute
`PyCallable_Check(PyObject*)`	Checks if the attribute can be executed
`PyObject_Repr(PyObject*)`	Creates a `PyObject` from the printed representation
`PyObject_Str(PyObject*)`	Creates a `PyObject` from the string representation
`PyObject_CallObject(` `PyObject, PyObject)`	Executes the object with arguments
`PyINCREF(PyObject*)`	Increments the reference (can't be `null`)
`PyXINCREF(PyObject*)`	Increments the reference (can be `null`)
`PyDECREF(PyObject*)`	Decrements the reference (can't be `null`)
`PyXDECREF(PyObject*)`	Decrements the reference (can be `null`)

The first function, Py_Initialize, is particularly important because it performs the tasks needed to make Python available in C/C++. This must be called before the application can access Python modules and features.

After initializing the environment, an application can access Python modules by calling PyImport_ImportModule. This accepts the name of the module and returns a PyObject pointer that represents the module. If the module is contained in a Python file, the *.py suffix should be omitted.

For example, the following function call accesses the code in simple.py:

C++

PyObject *mod = PyImport_ImportModule("simple");

Once an application has accessed a module or data structure, it can examine its attributes. The PyObject_HasAttrString function identifies if an attribute is present. If the attribute is present, PyObject_GetAttrString returns a PyObject pointer representing the attribute.

For example, the following code accesses an attribute named plus from simple.py.

C++

PyObject *mod, *attr;
mod = PyImport_ImportModule("simple");
if (mod != nullptr) {
    if (PyObject_HasAttrString(mod, "plus") == 1) {
        attr = PyObject_GetAttrString(mod, "plus");
    }
}

Properties and functions are both accessed as attributes, but functions can be invoked and properties can't. To check if an attribute can be invoked, an application needs to call PyCallable_Check, which returns 1 if the attribute can be called and 0 if it can't.

If an attribute can be invoked, PyObject_CallObject tells the interpreter to execute the attribute. The first argument is the PyObject pointer representing the attribute and the second is a PyObject pointer that represents a tuple containing the method's arguments.

Every PyObject has a reference count that identifies how many times it's being accessed. When it's created, the count is set to 1. Applications can increment the reference count by calling PyINCREF or PyXINCREF. Both accept a pointer to a PyObject, and the first should only be called if the pointer isn't NULL. The second function, PyXINCREF, can be called if the pointer is null.

When a PyObject is no longer needed, the application should call PyDECREF or PyXDECREF to decrement the object's reference count. Once the count reaches 0, the object will be deallocated. Both functions accept a pointer to a PyObject, and PyDECREF should only be called if the pointer isn't null.

3.2 Object Creation and Conversion

In many cases, an application will need to create a PyObject from regular data or extract regular data from a PyObject. The application may also need to create Python-specific structures like lists or tuples. Table 2 lists the functions that perform these tasks.

Table 2: Object Creation/Conversion Functions

Function Signature	Description
`PyLong_FromLong(long)`	Create a `PyObject` from a long integer
`PyLong_AsLong(PyObject*)`	Return the `PyObject`'s long integer
`PyFloat_FromDouble(double)`	Create a `PyObject` from a double
`PyFloat_AsDouble(PyObject*)`	Return the `PyObject`'s double
`PyUnicode_FromString(const char*)`	Create a `PyObject` from a string
`PyUnicode_AsEncodedString(` `PyObject, const char, const char*)`	Create a `PyObject` from the encoded string
`PyBytes_FromString(const char*)`	Create a `PyObject` from a string
`PyBytes_AsString(PyObject*)`	Return the `PyObject`'s string
`PyTuple_New(Py_ssize_t)`	Create a `PyObject` representing a tuple
`PyTuple_GetItem(PyObject*, Py_ssize_t)`	Return the given element of a tuple
`PyTuple_SetItem(PyObject, Py_ssize_t,` `PyObject)`	Set the given element of a tuple
`PyList_New(Py_ssize_t)`	Create a `PyObject` representing a list
`PyList_GetItem(PyObject*, Py_ssize_t)`	Return the given element of a list
`PyList_SetItem(PyObject, Py_ssize_t,` `PyObject)`	Set the given element of a list

These functions become important when an application needs to read or set an attribute's value. For example, if float_attr is a Python attribute containing a floating-point value, PyFloat_AsDouble will return a double that can be processed in C/C++.

Dealing with text is complicated. The Python API makes it possible to create a unicode PyObject from a string by calling PyUnicode_FromString. You can also create a bytes PyObject from a string by calling PyBytes_FromString.

Displaying an attribute's string is also complicated. PyObject_Str returns a PyObject containing an object's string and PyUnicode_AsEncodedString converts this to a bytes PyObject containing the encoded representation of the string. Then PyBytes_AsString returns the C/C++ string corresponding to the encoded string.

For example, the following code obtains the string representation of the my_attr attribute, encodes it using UTF-8, and prints the corresponding C/C++ string.

C++

PyObject* attr = PyObject_GetAttrString(mod, "my_attr");
PyObject* str = PyObject_Str(attr);
PyObject* ucode = PyUnicode_AsEncodedString(str, "utf-8", NULL);
const char* bytes = PyBytes_AsString(ucode);
std::cout << bytes << std::endl;

To pass arguments to a Python method, an application needs to create a tuple and insert an element for each argument to be passed. In code, the tuple can be created by calling PyTuple_New and elements can be set by calling PyTuple_SetItem. Similarly, an application can create a Python list by calling PyList_New and set its elements by calling PyList_SetItem.

3.3 Simple Embedding Example

To demonstrate how embedding works, the source code for this article contains two source files:

simple.py - a simple Python file that defines a function named plus, which returns the sum of its two arguments
embedding.cpp - a C++ application that uses the Python API to access the code in simple.py

The following code presents the content of embedding.cpp. This accesses simple.py, finds the plus attribute, sets its arguments, and executes the function.

C++

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <iostream>

int main() {

    PyObject *module, *func, *args, *ret;

    // Initialize python access
    Py_Initialize();

    // Import the simple.py code
    module = PyImport_ImportModule("simple");
    if (module) {

        // Access the attribute named plus
        func = PyObject_GetAttrString(module, "plus");

        // Make sure the attribute is callable
        if (func && PyCallable_Check(func)) {

            // Create a tuple to contain the function's args
            args = PyTuple_New(2);
            PyTuple_SetItem(args, 0, PyLong_FromLong(4));
            PyTuple_SetItem(args, 1, PyLong_FromLong(7));

            // Execute the plus function in simple.py
            ret = PyObject_CallObject(func, args);
            Py_DECREF(args);
            Py_DECREF(func);
            Py_DECREF(module);

            // Check the return value
            if (ret) {

                // Convert the value to long and print
                long retVal = PyLong_AsLong(ret);
                std::cout << "Result: " << retVal << std::endl;
            }
            else {

                // Display error
                PyErr_Print();
                std::cerr << "Couldn't access return value" << std::endl;
                Py_Finalize();
                return 1;
            }
        }
        else {

            // Display error
            if (PyErr_Occurred())
                PyErr_Print();
            std::cerr << "Couldn't execute function" << std::endl;
        }
    }
    else {

        // Display error
        PyErr_Print();
        std::cerr << "Couldn't access module" << std::endl;
        Py_Finalize();
        return 1;
    }

    // Finalize the Python embedding
    Py_Finalize();
    return 0;
}

After accessing the module, the application invokes the module's plus function by calling PyObject_CallObject. It passes two arguments: the attribute representing the function and a tuple containing the two values to be passed to the plus function. The following code shows what the plus function in simple.py looks like.

Python

def plus(a, b):
    return a + b

To build an application from embedding.cpp, you'll need to tell the compiler about the header files in the include folder of the Python installation directory and the library file in the libs folder. When I run the application on my system, the output is given as follows:

Result: 11

4. Creating a Python Extension

A Python interpreter can access built-in modules that include os, datetime, and string. Using the Python API, we can add new built-in modules called extension modules. Unlike regular modules, extension modules are dynamic libraries coded in C or C++.

This discussion walks through the development of an extension module named plustwo. This contains a function named addtwo, which accepts a number and returns the sum of the number and 2.

Python

python
>>> import plustwo
>>> x = plustwo.addtwo(5)
>>> x
7

Programming extension modules is hard because the functions need to have special names and they have to provide special data structures. To understand the process, you need to be aware of three points:

If the desired module is modname, the code must define a function named PyInit_modname that doesn't accept any parameters. For the example, the module is named plustwo, so the code defines a function named PyInit_plustwo.
To describe the module, the code must create a PyModuleDef structure and set its fields. These fields identify the module's name, its documentation, and its methods.
For each function in the module, the code must create a PyMethodDef structure and set its fields. These fields identify the method's name, arguments, and documentation. It must also identify the function that will be called when the method is invoked.

This section discusses these points in detail and then shows how the plustwo extension module can be implemented in code.

4.1 The PyInit Function

When a Python interpreter imports the extension module modname for the first time, it will call the PyInit_modname function. This must be coded properly to ensure that the interpreter can execute it, and there are five rules to follow:

It must be preceded by the PyMODINIT_FUNC macro, and it must be the only function preceded by this macro.
It must not be static, and it must be the only non-static item in the code.
It can't accept any parameters.
It must call PyModuleCreate with a reference to the PyModuleDef that describes the module.
Its return value must be set to the return value of PyModuleCreate.

The best way to understand these rules is to look at an example. If the module name is plustwo and the PyModuleDef structure that describes the module is moduleDef, the following code shows how PyInit_plustwo can be coded:

C++

PyMODINIT_FUNC PyInit_plustwo() {
    return PyModule_Create(&moduleDef);
}

At minimum, the function needs to call PyModule_Create with a PyModuleDef reference and return the result. But the function can be coded to do more than call PyModule_Create.

4.2 The PyModuleDef Structure

The extension module needs to create a PyModuleDef structure to tell the Python interpreter how the module should be processed. Table 3 lists each of the fields of this structure and their data types.

Table 3: Fields of the PyModuleDef Structure

Field Name	Data Type	Description
`m_base`	`PyModuleDef_Base`	Always set to `PyModuleDef_HEAD_INIT`
`m_name`	`const char*`	The module's name
`m_doc`	`const char*`	The module's description
`m_size`	`Py_ssize_t`	Size of memory to store module state
`m_methods`	`PyMethodDef*`	Array of method descriptors
`m_slots`	`PyMethodDef_Slot*`	Array of method slots
`m_traverse`	`traverseproc`	Traversal function
`m_clear`	`inquiry`	Inquiry function
`m_free`	`freefunc`	Function that frees resources

This article focuses on the first five fields, and the first should always be set to PyModuleDef_HEAD_INIT. The second should be set to the module's name and the third should be set to the module's docstring, which is displayed when the help function is called.

The fourth field is important for modules that require multi-phase initialization and sub-interpreters. This field identifies how much memory should be set aside to store the module's state data. This isn't a concern for most extension modules, so m_size should be set to -1. This is shown in the following code:

C++

static struct PyModuleDef moduleDef = {
    PyModuleDef_HEAD_INIT, 
    "plustwo", 
    "This module contains a function (addtwo) that adds two to a number\n", 
    -1, 
    funcs  // Array containing a PyMethodDef for each module function
};

The m_methods field must be set to an array containing a PyMethodDef structure for each function contained in the module. In the example, the plustwo module has one function named addtwo. Therefore, the funcs array in the example code contains one PyMethodDef structure.

4.3 The PyMethodDef Structure

An extension module identifies its functions by providing an array of PyMethodDef structures. The fields of a PyMethodDef identify the function's name, arguments, and docstring. Table 4 lists these fields and their data types.

Table 4: Fields of the PyMethodDef Structure

Field Name	Data Type	Description
`ml_name`	`const char*`	The function's name
`ml_meth`	`PyCFunction`	The C function that provides the code
`ml_flags`	`int`	Flags that identify the function's arguments
`ml_doc`	`const char*`	The function's `docstring`

The second field, ml_meth, must be set to a C function that provides the code to be executed when the module's function is called. When coding this function, there are four rules to keep in mind:

If the module function has a different name than the module, the name of the C function should be set to modname_funcname, where modname is the name of the module and funcname is the name of the function.
If the module function has the same name as its surrounding module, the name of the C function should be set to modname.
The number of arguments accepted by the function is determined by the ml_flags argument of the PyMethodDef.
The C function must be declared static and it must return a PyObject pointer. If the function doesn't return a value, it should use Py_RETURN_NONE to return an empty object.

The third field, ml_flags, identifies the nature of the arguments accepted by the function. This is usually set to one of three values:

METH_NOARGS - The function accepts a single PyObject* argument that represents the module.
METH_O - The function accepts two arguments: a PyObject* representing the module and a PyObject* that represents the single argument.
METH_VARARGS - The function accepts two arguments: a PyObject* representing the module and a PyObject* that represents a tuple containing the function's parameters.

For this article's example, the addtwo function accepts a numeric argument and returns the sum of the argument and 2. Because there's only one argument, ml_flags should be set to METH_O.

4.4 The plustwo Extension Module

At this point, you should have a basic grasp of the functions and data structures that must be created in an extension module. The following listing presents the code in plustwo.cpp, which is part of this article's source code. This file defines an extension module named plustwo containing a function named addtwo:

C++

#define PY_SSIZE_T_CLEAN
#include <Python.h>

// The code to be executed when the module function is called
static PyObject* plustwo_addtwo(PyObject* self, PyObject* arg) {
    long longArg = PyLong_AsLong(arg) + 2;
    return PyLong_FromLong(longArg);
}

// Array of PyMethodDef structures - describe the module's functions
static PyMethodDef funcs[] = {
    {"addtwo", (PyCFunction)plustwo_addtwo, METH_O, "This adds two to a number\n"},
    {NULL, NULL, 0, NULL}
};

// The PyModuleDef structure describes the module
static struct PyModuleDef moduleDef = {
    PyModuleDef_HEAD_INIT, 
    "plustwo", 
    "This module contains a function (addtwo) that adds two to a number\n", 
    -1, 
    funcs
};

// Called when the interpreter imports the module
PyMODINIT_FUNC PyInit_plustwo() {
    return PyModule_Create(&moduleDef);
}

As you look at this code, there are a few items to notice:

All the functions and structures are static except for PyInit_plustwo at the end.
The function corresponding to the addtwo module function is called plustwo_addtwo because the module function has a different name than the module.
The module only has one function, but the array of PyMethodDefs has two elements. The second element defines a null function, and if this isn't present, the code won't work.
The third argument of the PyMethodDef is METH_O, which specifies that the function only accepts one argument. But in code, plustwo_addtwo accepts two arguments: a PyObject representing the module and a PyObject representing the input argument.

To serve as an extension module, this code must be compiled as a dynamic library (plustwo.dll on Windows, plustwo.so on Linux and macOS). On Windows, plustwo.dll must be renamed to plustwo.pyd, which identifies the file as a Python dynamic module. On Linux, the *.so suffix can be left unchanged.

Once the extension module is created, you can test it by opening a Python prompt. Then you can import the plustwo module and call the addtwo function with a session like the following:

Python

python
>>> import plustwo
>>> x = plustwo.addtwo(5)
>>> x
7

5. History

2^nd August, 2023: Initial submission
4^th August, 2023: Fixed code labels
9^th August, 2023: Added calls to PyFinalize

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)