Python Integration
This guide details the internal architecture of Neuralyzer’s Python integration, explaining how the embedded interpreter works, how data is marshaled between C++ and Python, and how to extend the bindings.
Architecture Overview
The Python integration is built on three main components:
PythonWidget(Qt UI):- Located in
src/WhiskerToolbox/Python_Widget. - Handles the visual interface: Console, Editor, and Properties panel.
- Manages the
PythonEngineinstance. - Connects UI signals (like “Run” button) to the engine’s
execute()method.
- Located in
PythonEngine(Embedded Interpreter):- Located in
src/python_bindings/PythonEngine.cpp. - Wraps
pybind11::embed. - Manages the Python interpreter lifecycle (initialization, finalization).
- Maintains the
globals()dictionary to persist state across executions (REPL behavior). - Redirects
sys.stdoutandsys.stderrto C++ callbacks for display in the UI.
- Located in
PythonBridge(Data Marshalling):- Located in
src/python_bindings/PythonBridge.cpp. - Acts as the bridge between the application’s
DataManagerand the Python environment. - Injects the
dmproxy object into the Python namespace on startup. - Handles the conversion of C++ data types to Python objects and vice-versa.
- Located in
Bindings (whiskertoolbox_python)
The core of the integration is the whiskertoolbox_python PyBind11 module. This module exposes C++ classes to Python. It is embedded directly into the application binary, so there is no separate .pyd or .so file to deploy.
Source location: src/python_bindings/
Key Binding Files
bind_module.hpp: Forward declarations for binding functions.bindings.cpp: The mainPYBIND11_EMBEDDED_MODULEdefinition. Calls individual init functions.numpy_utils.hpp: Helper functions for zero-copy NumPy array creation.bind_datamanager.cpp: Binds theDataManagerclass and its methods.bind_analog.cpp: BindsAnalogTimeSeries.bind_point.cpp: BindsPointDataandPoint2D.bind_tensor.cpp: BindsTensorData.
Data Types and Zero-Copy Strategy
To ensure high performance with large datasets (e.g., electrophysiology or high-speed video data), we prioritize zero-copy access where possible.
AnalogTimeSeries & TensorData
These types store data in contiguous memory blocks (std::vector or similar). We expose these buffers to Python as read-only NumPy arrays.
- C++ Side:
std::span<float const> - Python Side:
numpy.ndarray(flags:WRITEABLE=False)
This allows users to use libraries like numpy and scipy directly on the data without the overhead of copying millions of samples.
Ragged Data (Points, Lines, Masks)
Data types that are “ragged” (variable number of elements per time point) are more difficult to map directly to NumPy. * Currently, these are exposed via std::vector copies (e.g., getAtTime(t) returns a list of objects). * Future optimization may involve creating custom iterators or structured NumPy arrays.
Extending the Bindings
To add a new data type to the Python environment:
Create a Binding Function: Create a new source file (e.g.,
src/python_bindings/bind_mytype.cpp). Define a functionvoid init_mytype(py::module_ & m). Usepy::class_to bind your C++ class.#include "MyType.hpp" #include "bind_module.hpp" // for init_mytype declaration void init_mytype(py::module_ & m) { py::class_<MyType, std::shared_ptr<MyType>>(m, "MyType") .def(py::init<>()) .def("doSomething", &MyType::doSomething); }Register the Module: Add the
init_mytype(m);call tosrc/python_bindings/bindings.cpp.Update
getDataVariant: EnsureDataManager::getDataVariant(in C++) andbind_datamanager.cpp(in bindings) can handle the new type.- In
bind_datamanager.cpp, add asetDataoverload for your new type. - The
getDatamethod usesstd::visitand should handle the new type automatically if it’s in theDataTypeVariant.
- In
Rebuild: Recompile the application. The new type will be available in the
whiskertoolbox_pythonmodule.