Post-Encoder Feature Extraction Modules
Overview
Post-encoder modules are optional processing stages applied between the encoder backbone and the channel decoder. They transform the raw 4-D feature map [B, C, H, W] produced by the encoder into a form suitable for further decoding — most commonly collapsing the spatial dimensions.
All post-encoder modules implement the dl::PostEncoderModule abstract interface, live in src/DeepLearning/post_encoder/, and are instantiated via PostEncoderModuleFactory.
[B,C,H,W] encoder output
│
▼
PostEncoderModule::apply()
│
▼
[B,C] or [B,C,H',W'] reduced tensor
│
▼
Channel Decoder (e.g. TensorToFeatureVector)
Abstract Interface
// src/DeepLearning/post_encoder/PostEncoderModule.hpp
class PostEncoderModule {
public:
virtual ~PostEncoderModule() = default;
/// Apply the module to the encoder output.
virtual at::Tensor apply(at::Tensor const &) const = 0;
/// Given raw encoder output shape (C, H, W), return the output shape.
virtual std::vector<int64_t> outputShape(
std::vector<int64_t> const & encoder_shape) const = 0;
virtual std::string name() const = 0;
};The header forward-declares at::Tensor to remain torch-free at inclusion time. All concrete .cpp implementations include <torch/torch.h> directly.
Implementations
GlobalAvgPoolModule
[B,C,H,W] → [B,C] via torch::adaptive_avg_pool2d({1,1}).squeeze(-1).squeeze(-1).
Use when you want a single feature vector summarising the entire spatial map.
// Example
auto mod = std::make_unique<dl::GlobalAvgPoolModule>();
auto out = mod->apply(features); // [B,C]SpatialPointExtractModule
Extracts the C-dimensional feature vector at a specific 2-D point in source-image coordinates. Supports two backends controlled by dl::InterpolationMode:
| Mode | Mechanics |
|---|---|
Nearest |
Scales point to feature-map pixel, clamps, slices [B, C, iy, ix] |
Bilinear |
Normalises point to [-1, 1], creates [B,1,1,2] grid, calls torch::nn::functional::grid_sample with align_corners=true and border padding |
The module is stateful: call setPoint(Point2D<float>) before every forward pass. When used with SlotAssembler, the widget calls updateSpatialPoint() per frame, which reads the first point from the configured PointData key and forwards it to the module.
ImageSize const src_size{640, 480};
auto mod = std::make_unique<dl::SpatialPointExtractModule>(
src_size, dl::InterpolationMode::Bilinear);
mod->setPoint({320.0f, 240.0f}); // centre of frame
auto out = mod->apply(features); // [B,C]PostEncoderPipeline
An ordered chain of PostEncoderModule instances that is itself a PostEncoderModule. apply() threads the tensor through each stage in sequence. outputShape() propagates the shape transformation through all stages.
An empty pipeline is a pass-through (no allocation, no copy).
dl::PostEncoderPipeline pipeline;
pipeline.add(std::make_unique<dl::GlobalAvgPoolModule>());
// more modules can be added...
auto out = pipeline.apply(features);Factory
PostEncoderModuleFactory::create(name, params) maps string keys to concrete modules:
| Key | Result |
|---|---|
"none" / "" |
nullptr (no post-processing) |
"global_avg_pool" |
GlobalAvgPoolModule |
"spatial_point" |
SpatialPointExtractModule using params.source_image_size and params.interpolation |
| anything else | nullptr (unknown key) |
PostEncoderModuleParams carries:
struct PostEncoderModuleParams {
ImageSize source_image_size;
std::string interpolation; // "nearest" | "bilinear"
};Integration with GeneralEncoderModel
GeneralEncoderModel holds an optional unique_ptr<PostEncoderModule> set via setPostEncoderModule(...).
forward()applies the module (if set) to the raw encoder output before returning.effectiveOutputShape()returns the shape after the module’s transformation;outputSlots()uses this to advertise the correct slot dimensions to the widget.outputShape()still returns the raw encoder shape.
Integration with SlotAssembler
SlotAssembler exposes two public methods:
void configurePostEncoderModule(
std::string const & module_type,
ImageSize source_image_size,
std::string const & interpolation);
void updateSpatialPoint(
DataManager & dm,
std::string const & point_key,
int frame);configurePostEncoderModule performs a dynamic_cast<GeneralEncoderModel*> and calls setPostEncoderModule on success. If the current model is not a GeneralEncoderModel, the call is silently ignored.
updateSpatialPoint reads PointData[point_key] at frame, takes the first point, and calls SpatialPointExtractModule::setPoint(). It is called automatically before every forward pass (both single-frame and batch) when a non-empty spatial_point_key is stored in the assembler’s Impl.
Widget UI
The Post-Encoder Module section appears at the bottom of the Dynamic Slot panels in DeepLearningPropertiesWidget. It contains:
- Module combo:
None,Global Average Pooling,Spatial Point Extraction - Interpolation combo (spatial only):
Bilinear,Nearest - Point Key combo (spatial only): populated from all
PointDatakeys in DataManager
On any change the widget immediately calls _assembler->configurePostEncoderModule(...) and persists the selection to DeepLearningState (fields post_encoder_module_type and post_encoder_point_key).
Batch Inference and TensorData Output
For batch runs with TensorToFeatureVector as the decoder, _mergeResults() collects (frame_index, std::vector<float>) pairs in _pending_feature_rows (keyed by data_key). At _onBatchFinished(), the rows are sorted by frame, flattened, and written as a single TensorData::createOrdinal2D matrix to the DataManager. Each row corresponds to one frame and each column to one feature channel.
Testing
Unit tests live in tests/DeepLearning/post_encoder/PostEncoderModules.test.cpp:
GlobalAvgPoolModule: output shape, value averaging, single-pixel passthroughSpatialPointExtractModule: name/shape, nearest/bilinear extraction, clampingPostEncoderPipeline: empty passthrough, single-module delegation, shape chainingPostEncoderModuleFactory:none,global_avg_pool,spatial_point, unknown key
Run with:
ctest --preset linux-clang-release -R "post_encoder"