Storage Backend Benchmarking Roadmap

Author

WhiskerToolbox Team

Published

February 26, 2026

Overview

This document defines the benchmarking plan for quantifying the performance of storage backends across all five core time series types. The goals are to:

Validate that cache optimization effectively eliminates virtual dispatch overhead
Measure the cost of view indirection and lazy computation
Provide data-driven guidance for choosing the right storage backend
Detect performance regressions as the codebase evolves

The project uses Google Benchmark (already integrated in the benchmark/ directory).

Benchmark Coverage Matrix

Iteration Performance

Measures time to iterate through all elements. This is the most critical benchmark — cache optimization directly impacts iteration throughput.

Benchmark	AnalogTS	RaggedAnalogTS	RaggedTS<T>	DigitalEventS	DigitalIntervalS
Raw vector baseline	Create	Create	Create	Create	Create
Owning (cached)	Create	Create	Create	Create	Create
View contiguous (cached)	Create	Create	Create	Create	Create
View sparse (uncached)	Create	Create	Create	Create	Create
Lazy	Create	Create	Create	Create	Create

Target: Owning iteration ≤5% slower than raw vector. Contiguous view ≤10% slower than owning.

Lookup Performance

Measures the cost of individual lookups by time or EntityId.

Benchmark	AnalogTS	RaggedAnalogTS	RaggedTS<T>	DigitalEventS	DigitalIntervalS
Time range query	Create	Create	Create	Create	Create
EntityId lookup	N/A	N/A	Create	Create	Create
Interval overlap query	N/A	N/A	N/A	N/A	Create
Interval containment query	N/A	N/A	N/A	N/A	Create

Factory Method Performance

Measures the cost of creating views and materializing lazy storage.

Benchmark	AnalogTS	RaggedAnalogTS	RaggedTS<T>	DigitalEventS	DigitalIntervalS
createView (time, 10% filter)	Create	Create	Create	Create	Create
createView (time, 50% filter)	Create	Create	Create	Create	Create
createView (EntityIds)	N/A	N/A	Create	Create	Create
materialize (owning→owning)	Create	Create	Create	Create	Create
materialize (lazy→owning)	Create	Create	Create	Create	Create

Mutation Performance

Measures insert and remove operations on owning storage.

Benchmark	AnalogTS	RaggedAnalogTS	RaggedTS<T>	DigitalEventS	DigitalIntervalS
Sequential insert	Existing	Create	Create	Create	Create
Random-time insert	Create	Create	Create	Create	Create
Remove by EntityId	N/A	N/A	Create	Create	Create

Memory Footprint

Measures memory usage per backend for a fixed number of elements (1M).

Measurement	AnalogTS	RaggedAnalogTS	RaggedTS<T>	DigitalEventS	DigitalIntervalS
Owning storage	Measure	Measure	Measure	Measure	Measure
View storage overhead	Measure	Measure	Measure	Measure	Measure
Lazy storage overhead	Measure	Measure	Measure	Measure	Measure
Cache struct size	Measure	Measure	Measure	Measure	Measure

Benchmark Parameters

Data Sizes

Each iteration/lookup benchmark should be run at multiple scales to detect algorithmic complexity issues:

Parameter Name	Values	Purpose
`num_elements`	1K, 10K, 100K, 1M	Scaling behavior
`filter_ratio`	0.01, 0.10, 0.50, 0.90	View creation selectivity
`num_queries`	1K, 10K	Amortized lookup cost

Data Types for RaggedTimeSeries

RaggedTimeSeries<T> benchmarks should use representative concrete types:

Type	Size (bytes)	Represents
`Point2D<float>`	~8	Small, trivially copyable
`Line2D`	~variable	Medium, heap-allocated vector
`Mask2D`	~variable	Large, heap-allocated 2D array

Proposed File Structure

benchmark/
├── storage_backend/
│   ├── CMakeLists.txt
│   ├── StorageBenchmarkFixtures.hpp
│   ├── AnalogTimeSeries.benchmark.cpp
│   ├── RaggedAnalogTimeSeries.benchmark.cpp
│   ├── RaggedTimeSeries.benchmark.cpp
│   ├── DigitalEventSeries.benchmark.cpp
│   └── DigitalIntervalSeries.benchmark.cpp
├── MaskArea.benchmark.cpp              # Existing
├── PolyLineUpload.benchmark.cpp        # Existing
└── ...

StorageBenchmarkFixtures.hpp provides shared fixture creation for each backend:

// Creates an owning DigitalEventSeries with N sorted events
inline auto makeOwningDigitalEvents(size_t n) 
    -> std::shared_ptr<DigitalEventSeries>;

// Creates a view over a subset of the source
inline auto makeViewDigitalEvents(
    std::shared_ptr<DigitalEventSeries> source, 
    double filter_ratio) -> std::shared_ptr<DigitalEventSeries>;

// Creates a lazy series from a transform view
inline auto makeLazyDigitalEvents(
    std::shared_ptr<DigitalEventSeries> source) 
    -> std::shared_ptr<DigitalEventSeries>;

Example Benchmark

#include <benchmark/benchmark.h>
#include "DataManager/DigitalTimeSeries/DigitalEventSeries.hpp"

static void BM_DigitalEvent_Iteration_Owning(benchmark::State& state) {
    auto const n = static_cast<size_t>(state.range(0));
    auto series = makeOwningDigitalEvents(n);
    
    for (auto _ : state) {
        int64_t sum = 0;
        for (auto const& elem : series->elementsView()) {
            sum += elem.time().getValue();
        }
        benchmark::DoNotOptimize(sum);
    }
    state.SetItemsProcessed(
        state.iterations() * static_cast<int64_t>(n));
}
BENCHMARK(BM_DigitalEvent_Iteration_Owning)
    ->Arg(1000)->Arg(10000)->Arg(100000)->Arg(1000000);

static void BM_DigitalEvent_Iteration_ViewContiguous(benchmark::State& state) {
    auto const n = static_cast<size_t>(state.range(0));
    auto source = makeOwningDigitalEvents(n);
    auto view = DigitalEventSeries::createView(
        source, TimeFrameIndex{0}, TimeFrameIndex{static_cast<int64_t>(n)});
    
    for (auto _ : state) {
        int64_t sum = 0;
        for (auto const& elem : view->elementsView()) {
            sum += elem.time().getValue();
        }
        benchmark::DoNotOptimize(sum);
    }
    state.SetItemsProcessed(
        state.iterations() * static_cast<int64_t>(n));
}
BENCHMARK(BM_DigitalEvent_Iteration_ViewContiguous)
    ->Arg(1000)->Arg(10000)->Arg(100000)->Arg(1000000);

static void BM_DigitalEvent_Iteration_RawVector(benchmark::State& state) {
    auto const n = static_cast<size_t>(state.range(0));
    std::vector<int64_t> raw(n);
    std::iota(raw.begin(), raw.end(), 0);
    
    for (auto _ : state) {
        int64_t sum = 0;
        for (auto val : raw) { sum += val; }
        benchmark::DoNotOptimize(sum);
    }
    state.SetItemsProcessed(
        state.iterations() * static_cast<int64_t>(n));
}
BENCHMARK(BM_DigitalEvent_Iteration_RawVector)
    ->Arg(1000)->Arg(10000)->Arg(100000)->Arg(1000000);

Success Criteria

Metric	Target	Rationale
Owning iteration vs raw vector	≤5% overhead	Cache fast-path should be near-zero cost
View contiguous vs owning	≤10% overhead	Pointer offset should be negligible
View sparse vs owning	2–5x slower	Indirection through index vector expected
Lazy vs owning	5–50x slower	Depends on transform complexity
Cache struct overhead	≤32 bytes per container	3 pointers + size + bool
View creation (10% filter)	O(n)	Linear scan to build index vector
Materialization	≤2x iteration cost	Simple copy with allocation

Measurement Tools

Tool	Use Case	Availability
Google Benchmark	Microbenchmarks (iteration, lookup)	Integrated in `benchmark/`
heaptrack	Memory profiling	Available in environment
perf	CPU profiling, cache miss analysis	Available on Linux
`-ftime-trace` (Clang)	Compile-time analysis	Enabled via `ENABLE_TIME_TRACE` preset

Priority Order

High: Iteration benchmarks (owning vs view vs lazy vs raw) — validates core optimization
Medium: Lookup benchmarks, factory method benchmarks — validates algorithmic choices
Low: Mutation benchmarks, memory footprint — informational, less likely to regress