Creating a C++ Model Wrapper

Overview

This guide explains how to integrate a PyTorch model into Neuralyzer by creating a C++ wrapper. There are two approaches:

  1. Runtime JSON model — define the model’s I/O in a JSON file (no C++ needed)
  2. Compiled C++ model — create a ModelBase subclass with custom logic

Most models can use the JSON approach. The compiled approach is only needed when the model requires custom pre/post-processing logic in forward() that cannot be expressed through the standard encoder/decoder pipeline.

Prerequisites

  • A trained PyTorch model exported as either:
    • AOT Inductor (.pt2) — recommended, see AOT Inductor Tutorial
    • TorchScript (.pt) — torch.jit.trace() or torch.jit.script()
  • Knowledge of the model’s input and output tensor shapes

Approach 2: Compiled C++ Model

For models requiring custom logic (e.g., multi-step inference, output-to-input feedback loops, custom tensor manipulation), create a ModelBase subclass.

Step 1: Create the Header

Create a header in src/DeepLearning/models_v2/your_model/:

#ifndef WHISKERTOOLBOX_YOUR_MODEL_HPP
#define WHISKERTOOLBOX_YOUR_MODEL_HPP

#include "models_v2/ModelBase.hpp"

#include <memory>

namespace dl {

// Forward declare to keep torch out of the header if needed
class ModelExecution;

class YourModel : public ModelBase {
public:
    YourModel();
    ~YourModel() override;

    // Non-copyable, movable
    YourModel(YourModel const &) = delete;
    YourModel & operator=(YourModel const &) = delete;
    YourModel(YourModel &&) noexcept;
    YourModel & operator=(YourModel &&) noexcept;

    [[nodiscard]] std::string modelId() const override;
    [[nodiscard]] std::string displayName() const override;
    [[nodiscard]] std::string description() const override;
    [[nodiscard]] std::vector<TensorSlotDescriptor> inputSlots() const override;
    [[nodiscard]] std::vector<TensorSlotDescriptor> outputSlots() const override;

    void loadWeights(std::filesystem::path const & path) override;
    [[nodiscard]] bool isReady() const override;

    [[nodiscard]] int preferredBatchSize() const override;
    [[nodiscard]] int maxBatchSize() const override;

    std::unordered_map<std::string, torch::Tensor>
    forward(std::unordered_map<std::string, torch::Tensor> const & inputs) override;

private:
    std::unique_ptr<ModelExecution> _execution;
};

} // namespace dl

#endif

Step 2: Implement the Model

#include "YourModel.hpp"
#include "models_v2/ModelExecution.hpp"
#include "device/DeviceManager.hpp"
#include "registry/ModelRegistry.hpp"

namespace dl {

// ── Self-registration ──
DL_REGISTER_MODEL(YourModel);

// ── Metadata ──

std::string YourModel::modelId() const { return "your_model"; }
std::string YourModel::displayName() const { return "Your Model"; }
std::string YourModel::description() const {
    return "Description of what your model does.";
}

int YourModel::preferredBatchSize() const { return 0; }
int YourModel::maxBatchSize() const { return 0; }

// ── Slot Descriptors ──

std::vector<TensorSlotDescriptor> YourModel::inputSlots() const {
    return {
        TensorSlotDescriptor{
            .name = "image",
            .shape = {3, 256, 256},
            .description = "Input video frame",
            .recommended_encoder = "ImageEncoder",
        },
    };
}

std::vector<TensorSlotDescriptor> YourModel::outputSlots() const {
    return {
        TensorSlotDescriptor{
            .name = "output_mask",
            .shape = {1, 256, 256},
            .description = "Predicted segmentation mask",
            .recommended_decoder = "TensorToMask2D",
        },
    };
}

// ── Weight Loading ──

void YourModel::loadWeights(std::filesystem::path const & path) {
    _execution = std::make_unique<ModelExecution>();
    _execution->load(path);  // auto-detects backend from extension
}

bool YourModel::isReady() const {
    return _execution && _execution->isLoaded();
}

// ── Inference ──

std::unordered_map<std::string, torch::Tensor>
YourModel::forward(
    std::unordered_map<std::string, torch::Tensor> const & inputs)
{
    // Validate required inputs
    if (inputs.find("image") == inputs.end()) {
        throw std::runtime_error("Missing required input: image");
    }

    // Move inputs to the correct device
    auto & dm = DeviceManager::instance();
    std::vector<torch::Tensor> ordered_inputs;
    ordered_inputs.push_back(dm.toDevice(inputs.at("image")));

    // Execute inference
    auto outputs = _execution->execute(ordered_inputs);

    // Map outputs to named slots
    std::unordered_map<std::string, torch::Tensor> result;
    if (!outputs.empty()) {
        result["output_mask"] = outputs[0];
    }
    return result;
}

} // namespace dl

Step 3: Key Design Decisions

Slot Descriptors

When defining inputSlots() and outputSlots(), consider:

  • recommended_encoder / recommended_decoder — the UI pre-selects these but the user can override them. Choose the most common use case.
  • is_static = true — use for memory inputs that the user sets once (e.g., reference frames). The UI renders these as a separate “Memory Inputs” section.
  • is_boolean_mask = true — use for 0/1 flag tensors that indicate which memory slots are active. The UI renders checkboxes instead of data source selectors.
  • sequence_dim — set to a non-negative axis index if the model expects multiple frames stacked along that dimension. The SlotAssembler will automatically arrange static input entries along this axis.
  • Batch size — set preferredBatchSize() = 1 if your model has an output→input feedback loop (like NeuroSAM). Leave at 0 for models that can process arbitrary batch sizes.

Using ModelExecution

ModelExecution auto-detects the inference backend from the weight file extension:

Extension Backend API
.pt TorchScript torch::jit::load()
.pt2 AOT Inductor AOTIModelPackageLoader

You can also force a specific backend:

_execution = std::make_unique<ModelExecution>(BackendType::AOTInductor);

Device Management

Always use DeviceManager for device placement:

auto & dm = DeviceManager::instance();
auto device_tensor = dm.toDevice(cpu_tensor);  // moves to CUDA if available

Never create your own torch::Device objects or hardcode torch::kCUDA.

Step 4: Register with CMake

Add your source files to src/DeepLearning/CMakeLists.txt:

set(DEEP_LEARNING_SOURCES
    # ... existing sources ...
    models_v2/your_model/YourModel.hpp
    models_v2/your_model/YourModel.cpp
)

The DL_REGISTER_MODEL macro handles self-registration at static initialization time. Your model will appear in the Deep Learning widget’s model selector after rebuilding.

Step 5: Write Tests

Create tests/DeepLearning/models_v2/YourModel.test.cpp:

#include <catch2/catch_test_macros.hpp>

#include "models_v2/your_model/YourModel.hpp"
#include "registry/ModelRegistry.hpp"

TEST_CASE("YourModel metadata", "[YourModel]") {
    dl::YourModel model;
    CHECK(model.modelId() == "your_model");
    CHECK(model.displayName() == "Your Model");
}

TEST_CASE("YourModel input slots", "[YourModel]") {
    dl::YourModel model;
    auto inputs = model.inputSlots();
    REQUIRE(inputs.size() == 1);
    CHECK(inputs[0].name == "image");
    CHECK(inputs[0].shape == std::vector<int64_t>{3, 256, 256});
}

TEST_CASE("YourModel is not ready without weights", "[YourModel]") {
    dl::YourModel model;
    CHECK_FALSE(model.isReady());
}

TEST_CASE("YourModel registered in ModelRegistry", "[YourModel]") {
    auto & registry = dl::ModelRegistry::instance();
    auto models = registry.availableModels();
    CHECK(std::find(models.begin(), models.end(), "your_model") != models.end());
}

Choosing Between Approaches

Criterion JSON Runtime Model Compiled C++ Model
Recompilation needed No Yes
Custom pre/post-processing No Yes
Output→input feedback loops No Yes
Multiple inference steps No Yes
Simplicity Simplest More work
Typical use case Standard image→mask/point/line models Models like NeuroSAM with memory feedback

Model Export

Both approaches require the model to be exported from Python. See: