General Encoder Model

Overview

The GeneralEncoderModel is a reusable ModelBase subclass that wraps any image encoder backbone (e.g. ConvNeXt, ViT, ResNet) as a single-input, single-output model for spatial feature extraction. It maps an input image to a feature tensor without requiring a task-specific C++ wrapper for each architecture.

Architecture

Input Image [B, C, H, W]  →  GeneralEncoderModel  →  Features [B, C_out, H_out, W_out]
         "image" slot              (any .pt/.pt2)          "features" slot

Design Decisions

The GeneralEncoderModel is intentionally minimal:

Single input slot ("image") — accepts any image tensor
Single output slot ("features") — produces the raw encoder output
Configurable shapes — input resolution and output feature shape are set at construction time, allowing the same C++ class to work with any backbone
DynamicBatch — supports arbitrary batch sizes, controlled by the caller

For most users, defining an encoder via a RuntimeModelSpec JSON file (which internally creates a RuntimeModel) is sufficient and requires no C++ changes. The GeneralEncoderModel provides a convenience wrapper with sensible defaults that auto-registers in the ModelRegistry.

Compile-Time vs. Runtime Shapes

Input and output resolution do not need to be known at compile time:

TorchScript (.pt) accepts arbitrary dynamic shapes natively
AOT Inductor (.pt2) supports dynamic shapes via torch.export.Dim() — dimensions marked as dynamic can vary at runtime within declared bounds
RuntimeModel reads shapes from a JSON spec at runtime

The shapes declared in GeneralEncoderModel (or RuntimeModelSpec) are metadata for the UI and pre-allocation, not hard constraints enforced at the C++ level.

Usage

Option 1: C++ `GeneralEncoderModel` (Compiled)

The default constructor creates a model expecting 3×224×224 RGB input and 384×7×7 output (matching ConvNeXt-Tiny):

auto model = std::make_unique<dl::GeneralEncoderModel>();
model->loadWeights("/path/to/convnext_tiny_encoder.pt");

// Prepare input
auto image = torch::randn({1, 3, 224, 224});
std::unordered_map<std::string, torch::Tensor> inputs;
inputs["image"] = image;

// Run inference
auto outputs = model->forward(inputs);
auto features = outputs["features"];  // [1, 384, 7, 7]

For a different backbone, pass custom dimensions:

// ConvNeXt-Base: 1024-dim features
auto model = std::make_unique<dl::GeneralEncoderModel>(
    3, 224, 224, std::vector<int64_t>{1024, 7, 7});

Option 2: `RuntimeModelSpec` JSON (No Recompilation)

Create a JSON spec file (see examples/convnext_encoder_spec.json):

{
  "model_id": "convnext_tiny_encoder",
  "display_name": "ConvNeXt-Tiny Encoder",
  "weights_path": "convnext_tiny_encoder.pt",
  "batch_mode": { "dynamic": { "min": 1, "max": 32 } },
  "inputs": [
    { "name": "image", "shape": [3, 224, 224], "recommended_encoder": "ImageEncoder" }
  ],
  "outputs": [
    { "name": "features", "shape": [768, 7, 7] }
  ]
}

Load it via the ModelRegistry:

dl::ModelRegistry::instance().loadFromJson("/path/to/spec.json");
auto model = dl::ModelRegistry::instance().create("convnext_tiny_encoder");

Exporting Models

Use the provided Python export script to convert a torchvision ConvNeXt to TorchScript and/or AOT Inductor format:

python examples/export_convnext_encoder.py --model convnext_tiny --output-dir ./models

This produces: - convnext_tiny_encoder.pt — TorchScript format - convnext_tiny_encoder.pt2 — AOT Inductor format (requires PyTorch 2.1+) - convnext_tiny_encoder_spec.json — RuntimeModelSpec for the exported model

See examples/export_convnext_encoder.py --help for all options.

Slot Descriptors

Input: `"image"`

Property	Value
Shape	`[C, H, W]` (default: `[3, 224, 224]`)
DType	`Float32`
Recommended Encoder	`ImageEncoder`
Static	No

Output: `"features"`

Property	Value
Shape	Configurable (default: `[384, 7, 7]`)
DType	`Float32`
Recommended Decoder	(none — raw features)
Static	No

Registration

GeneralEncoderModel self-registers via DL_REGISTER_MODEL(GeneralEncoderModel) at static initialization time. It is available in the ModelRegistry under the ID "general_encoder".

Files

File	Purpose
`src/DeepLearning/models_v2/general_encoder/GeneralEncoderModel.hpp`	Header
`src/DeepLearning/models_v2/general_encoder/GeneralEncoderModel.cpp`	Implementation
`tests/DeepLearning/models_v2/GeneralEncoderModel.test.cpp`	Unit tests
`examples/convnext_encoder_spec.json`	Example RuntimeModelSpec
`examples/export_convnext_encoder.py`	Python export script