General Encoder Model
Overview
The GeneralEncoderModel is a reusable ModelBase subclass that wraps any image encoder backbone (e.g. ConvNeXt, ViT, ResNet) as a single-input, single-output model for spatial feature extraction. It maps an input image to a feature tensor without requiring a task-specific C++ wrapper for each architecture.
Architecture
Input Image [B, C, H, W] → GeneralEncoderModel → Features [B, C_out, H_out, W_out]
"image" slot (any .pt/.pt2) "features" slot
Design Decisions
The GeneralEncoderModel is intentionally minimal:
- Single input slot (
"image") — accepts any image tensor - Single output slot (
"features") — produces the raw encoder output - Configurable shapes — input resolution and output feature shape are set at construction time, allowing the same C++ class to work with any backbone
- DynamicBatch — supports arbitrary batch sizes, controlled by the caller
For most users, defining an encoder via a RuntimeModelSpec JSON file (which internally creates a RuntimeModel) is sufficient and requires no C++ changes. The GeneralEncoderModel provides a convenience wrapper with sensible defaults that auto-registers in the ModelRegistry.
Compile-Time vs. Runtime Shapes
Input and output resolution do not need to be known at compile time:
- TorchScript (
.pt) accepts arbitrary dynamic shapes natively - AOT Inductor (
.pt2) supports dynamic shapes viatorch.export.Dim()— dimensions marked as dynamic can vary at runtime within declared bounds RuntimeModelreads shapes from a JSON spec at runtime
The shapes declared in GeneralEncoderModel (or RuntimeModelSpec) are metadata for the UI and pre-allocation, not hard constraints enforced at the C++ level.
Usage
Option 1: C++ GeneralEncoderModel (Compiled)
The default constructor creates a model expecting 3×224×224 RGB input and 384×7×7 output (matching ConvNeXt-Tiny):
auto model = std::make_unique<dl::GeneralEncoderModel>();
model->loadWeights("/path/to/convnext_tiny_encoder.pt");
// Prepare input
auto image = torch::randn({1, 3, 224, 224});
std::unordered_map<std::string, torch::Tensor> inputs;
inputs["image"] = image;
// Run inference
auto outputs = model->forward(inputs);
auto features = outputs["features"]; // [1, 384, 7, 7]For a different backbone, pass custom dimensions:
// ConvNeXt-Base: 1024-dim features
auto model = std::make_unique<dl::GeneralEncoderModel>(
3, 224, 224, std::vector<int64_t>{1024, 7, 7});Option 2: RuntimeModelSpec JSON (No Recompilation)
Create a JSON spec file (see examples/convnext_encoder_spec.json):
{
"model_id": "convnext_tiny_encoder",
"display_name": "ConvNeXt-Tiny Encoder",
"weights_path": "convnext_tiny_encoder.pt",
"batch_mode": { "dynamic": { "min": 1, "max": 32 } },
"inputs": [
{ "name": "image", "shape": [3, 224, 224], "recommended_encoder": "ImageEncoder" }
],
"outputs": [
{ "name": "features", "shape": [768, 7, 7] }
]
}Load it via the ModelRegistry:
dl::ModelRegistry::instance().loadFromJson("/path/to/spec.json");
auto model = dl::ModelRegistry::instance().create("convnext_tiny_encoder");Option 3: Widget UI
The GeneralEncoderModel appears in the Deep Learning widget’s model selector as “General Encoder”. Select it, configure the image binding, load weights, and run inference through the standard widget workflow.
Exporting Models
Use the provided Python export script to convert a torchvision ConvNeXt to TorchScript and/or AOT Inductor format:
python examples/export_convnext_encoder.py --model convnext_tiny --output-dir ./modelsThis produces: - convnext_tiny_encoder.pt — TorchScript format - convnext_tiny_encoder.pt2 — AOT Inductor format (requires PyTorch 2.1+) - convnext_tiny_encoder_spec.json — RuntimeModelSpec for the exported model
See examples/export_convnext_encoder.py --help for all options.
Slot Descriptors
Input: "image"
| Property | Value |
|---|---|
| Shape | [C, H, W] (default: [3, 224, 224]) |
| DType | Float32 |
| Recommended Encoder | ImageEncoder |
| Static | No |
Output: "features"
| Property | Value |
|---|---|
| Shape | Configurable (default: [384, 7, 7]) |
| DType | Float32 |
| Recommended Decoder | (none — raw features) |
| Static | No |
Registration
GeneralEncoderModel self-registers via DL_REGISTER_MODEL(GeneralEncoderModel) at static initialization time. It is available in the ModelRegistry under the ID "general_encoder".
Files
| File | Purpose |
|---|---|
src/DeepLearning/models_v2/general_encoder/GeneralEncoderModel.hpp |
Header |
src/DeepLearning/models_v2/general_encoder/GeneralEncoderModel.cpp |
Implementation |
tests/DeepLearning/models_v2/GeneralEncoderModel.test.cpp |
Unit tests |
examples/convnext_encoder_spec.json |
Example RuntimeModelSpec |
examples/export_convnext_encoder.py |
Python export script |