General Encoder Model

Overview

The GeneralEncoderModel is a reusable ModelBase subclass that wraps any image encoder backbone (e.g. ConvNeXt, ViT, ResNet) as a single-input, single-output model for spatial feature extraction. It maps an input image to a feature tensor without requiring a task-specific C++ wrapper for each architecture.

Architecture

Input Image [B, C, H, W]  →  GeneralEncoderModel  →  Features [B, C_out, H_out, W_out]
         "image" slot              (any .pt/.pt2)          "features" slot

Design Decisions

The GeneralEncoderModel is intentionally minimal:

  • Single input slot ("image") — accepts any image tensor
  • Single output slot ("features") — produces the raw encoder output
  • Configurable shapes — input resolution and output feature shape are set at construction time, allowing the same C++ class to work with any backbone
  • DynamicBatch — supports arbitrary batch sizes, controlled by the caller

For most users, defining an encoder via a RuntimeModelSpec JSON file (which internally creates a RuntimeModel) is sufficient and requires no C++ changes. The GeneralEncoderModel provides a convenience wrapper with sensible defaults that auto-registers in the ModelRegistry.

Compile-Time vs. Runtime Shapes

Input and output resolution do not need to be known at compile time:

  • TorchScript (.pt) accepts arbitrary dynamic shapes natively
  • AOT Inductor (.pt2) supports dynamic shapes via torch.export.Dim() — dimensions marked as dynamic can vary at runtime within declared bounds
  • RuntimeModel reads shapes from a JSON spec at runtime

The shapes declared in GeneralEncoderModel (or RuntimeModelSpec) are metadata for the UI and pre-allocation, not hard constraints enforced at the C++ level.

Usage

Option 1: C++ GeneralEncoderModel (Compiled)

The default constructor creates a model expecting 3×224×224 RGB input and 384×7×7 output (matching ConvNeXt-Tiny):

auto model = std::make_unique<dl::GeneralEncoderModel>();
model->loadWeights("/path/to/convnext_tiny_encoder.pt");

// Prepare input
auto image = torch::randn({1, 3, 224, 224});
std::unordered_map<std::string, torch::Tensor> inputs;
inputs["image"] = image;

// Run inference
auto outputs = model->forward(inputs);
auto features = outputs["features"];  // [1, 384, 7, 7]

For a different backbone, pass custom dimensions:

// ConvNeXt-Base: 1024-dim features
auto model = std::make_unique<dl::GeneralEncoderModel>(
    3, 224, 224, std::vector<int64_t>{1024, 7, 7});

Option 2: RuntimeModelSpec JSON (No Recompilation)

Create a JSON spec file (see examples/convnext_encoder_spec.json):

{
  "model_id": "convnext_tiny_encoder",
  "display_name": "ConvNeXt-Tiny Encoder",
  "weights_path": "convnext_tiny_encoder.pt",
  "batch_mode": { "dynamic": { "min": 1, "max": 32 } },
  "inputs": [
    { "name": "image", "shape": [3, 224, 224], "recommended_encoder": "ImageEncoder" }
  ],
  "outputs": [
    { "name": "features", "shape": [768, 7, 7] }
  ]
}

Load it via the ModelRegistry:

dl::ModelRegistry::instance().loadFromJson("/path/to/spec.json");
auto model = dl::ModelRegistry::instance().create("convnext_tiny_encoder");

Option 3: Widget UI

The GeneralEncoderModel appears in the Deep Learning widget’s model selector as “General Encoder”. Select it, configure the image binding, load weights, and run inference through the standard widget workflow.

Exporting Models

Use the provided Python export script to convert a torchvision ConvNeXt to TorchScript and/or AOT Inductor format:

python examples/export_convnext_encoder.py --model convnext_tiny --output-dir ./models

This produces: - convnext_tiny_encoder.pt — TorchScript format - convnext_tiny_encoder.pt2 — AOT Inductor format (requires PyTorch 2.1+) - convnext_tiny_encoder_spec.json — RuntimeModelSpec for the exported model

See examples/export_convnext_encoder.py --help for all options.

Slot Descriptors

Input: "image"

Property Value
Shape [C, H, W] (default: [3, 224, 224])
DType Float32
Recommended Encoder ImageEncoder
Static No

Output: "features"

Property Value
Shape Configurable (default: [384, 7, 7])
DType Float32
Recommended Decoder (none — raw features)
Static No

Registration

GeneralEncoderModel self-registers via DL_REGISTER_MODEL(GeneralEncoderModel) at static initialization time. It is available in the ModelRegistry under the ID "general_encoder".

Files

File Purpose
src/DeepLearning/models_v2/general_encoder/GeneralEncoderModel.hpp Header
src/DeepLearning/models_v2/general_encoder/GeneralEncoderModel.cpp Implementation
tests/DeepLearning/models_v2/GeneralEncoderModel.test.cpp Unit tests
examples/convnext_encoder_spec.json Example RuntimeModelSpec
examples/export_convnext_encoder.py Python export script