Exporting Encoder Models
Overview
WhiskerToolbox can use general-purpose image encoder models (such as ConvNeXt, ViT, or ResNet) to extract feature maps from video frames. This guide explains how to export an encoder model from Python and load it in WhiskerToolbox.
Prerequisites
Install PyTorch and torchvision:
pip install torch torchvisionQuick Start: Export ConvNeXt
WhiskerToolbox ships with a ready-to-use export script for ConvNeXt models:
python examples/export_convnext_encoder.pyThis exports a ConvNeXt-Tiny encoder and produces three files:
| File | Description |
|---|---|
convnext_tiny_encoder.pt |
TorchScript model (works everywhere) |
convnext_tiny_encoder.pt2 |
AOT Inductor model (faster, PyTorch 2.1+) |
convnext_tiny_encoder_spec.json |
Model specification for WhiskerToolbox |
Using a Different ConvNeXt Variant
# ConvNeXt-Small (768-dim features)
python examples/export_convnext_encoder.py --model convnext_small
# ConvNeXt-Base (1024-dim features)
python examples/export_convnext_encoder.py --model convnext_base
# ConvNeXt-Large (1536-dim features)
python examples/export_convnext_encoder.py --model convnext_largeExport Options
python examples/export_convnext_encoder.py --help| Option | Description |
|---|---|
--model |
ConvNeXt variant (convnext_tiny, convnext_small, convnext_base, convnext_large) |
--output-dir |
Directory for exported files (default: current directory) |
--no-pt |
Skip TorchScript export |
--no-pt2 |
Skip AOT Inductor export |
Loading in WhiskerToolbox
Method 1: Widget UI
- Open the Deep Learning widget (View → Deep Learning)
- Select “General Encoder” from the model dropdown
- Under the “image” slot, bind to your video data source
- Click Load Weights and select the exported
.ptor.pt2file - Click Run to extract features
Method 2: JSON Specification
For more control, use the generated JSON specification file:
- Open the Deep Learning widget
- Click Load Model Spec and select the
_spec.jsonfile - The model will appear with the correct input/output shapes pre-configured
Exporting Custom Encoders
To export your own encoder (not a ConvNeXt), create a Python wrapper that outputs the spatial feature map and follows this pattern:
import torch
class MyEncoderWrapper(torch.nn.Module):
def __init__(self, backbone):
super().__init__()
self.backbone = backbone
def forward(self, x: torch.Tensor) -> torch.Tensor:
# x: [B, C, H, W] input image
# return: [B, C_out, H_out, W_out] feature map
return self.backbone.extract_features(x)
# Export as TorchScript
wrapper = MyEncoderWrapper(my_model).eval()
example = torch.randn(1, 3, 224, 224)
traced = torch.jit.trace(wrapper, example)
traced.save("my_encoder.pt")Then create a JSON specification:
{
"model_id": "my_encoder",
"display_name": "My Custom Encoder",
"weights_path": "my_encoder.pt",
"inputs": [
{ "name": "image", "shape": [3, 224, 224], "recommended_encoder": "ImageEncoder" }
],
"outputs": [
{ "name": "features", "shape": [512, 7, 7] }
]
}Output Format
The encoder output is a spatial feature tensor of shape [B, C, H, W]:
- B — batch size (number of frames processed simultaneously)
- C — number of feature channels (depends on the encoder architecture)
- H, W — spatial dimensions of the feature map
For ConvNeXt with 224×224 input, the output spatial size is 7×7 (224 / 32).
This feature tensor can be used as input to downstream processing such as global average pooling, spatial point extraction, or classification heads (see the Deep Learning roadmap for planned post-encoder modules).