Concurrency Architecture
Overview
This document describes the concurrency architecture for WhiskerToolbox: what problems need concurrency, what design patterns fit the existing single-threaded DataManager-centric architecture, and a staged plan for introducing safe concurrency without rewriting the core.
Context: How the System Works Today
WhiskerToolbox runs entirely on the Qt main thread. A shared DataManager instance holds all data objects (MaskData, LineData, PointData, AnalogTimeSeries, VideoData, etc.) in a std::map<std::string, DataTypeVariant> map. Consumers call getData<T>(key) to get a shared_ptr<T> — always mutable, always unsynchronized. The ObserverData callback system is synchronous and explicitly not thread-safe.
This is simple and correct for a single-threaded program. The concurrency challenge is to keep this core model intact while moving expensive work off the main thread.
Where Concurrency Is Needed
| Use Case | Nature | Duration | Current State |
|---|---|---|---|
| Data loading (CSV, binary, HDF5) | Pure producer — reads file, builds object | Seconds | Blocks main thread |
| Deep learning inference (batch) | Reads media frames, runs model, writes results | Minutes | Blocks main thread |
| Data transforms (TransformsV2) | Reads input data, computes, writes output | ms–seconds | Blocks main thread |
| OpenMP within transforms | Embarrassingly parallel inner loops | ms | Works but GUI still hangs |
| Interactive editing (masks, points) | User modifies data at one frame | μs | Works, main thread only |
| Display / rendering | Reads data for visualization | μs | Works, main thread only |
The pattern: 95% of operations are microsecond-scale and single-threaded. The remaining 5% are seconds-to-minutes and need to run in the background.
Why Not Locks on DataManager?
DataManager, ObserverData, and all data types (MaskData, PointData, LineData, RaggedTimeSeries, AnalogTimeSeries) are entirely unsynchronized. Adding mutexes would mean:
- Every
getData()call acquires a lock — overhead on every widget interaction (~100 call sites across the codebase). - Every
addAtTime()call acquires a lock — overhead on every data mutation. notifyObservers()fires synchronous callbacks while holding a lock — deadlock risk if any callback accesses DataManager.- Every consumer must reason about lock ordering to avoid deadlocks.
This would be a massive, error-prone refactor for marginal benefit. The system’s access pattern (grab data, compute, release) doesn’t match the mutex model (short critical sections with immediate release).
Why Not Per-Key Reader-Writer Locks?
A reader-writer lock per DataManager key ("masks", "points", etc.) seems attractive: multiple readers, exclusive writer, scoped to each data object.
The problem is that writes are not scoped to DataManager. Consumers call getData<MaskData>("masks") once and then call addAtTime() on the returned object many times across many frames. The “write” is a long-lived mutable reference, not a single atomic operation. A reader-writer lock would either:
- Lock for the entire duration the mutable reference is held — blocking all readers for seconds/minutes during a transform.
- Require lock-per-mutation — locking every
addAtTime()call, even for microsecond operations. This adds overhead everywhere and changes every consumer.
Neither is practical.
Why Not Per-Frame Locks on Time Series?
Since nearly all data is time series, locking individual time elements seems natural: a DL transform writes frame 100 while the UI reads frame 42.
This fails because of the underlying storage layout. RaggedTimeSeries stores data in contiguous SoA arrays (std::vector<TimeFrameIndex>, std::vector<TData>, std::vector<EntityId>). addAtTime() can trigger vector reallocation, which invalidates all pointers and iterators — not just those for the written frame. A per-frame lock cannot protect against reallocation.
Additionally, for microsecond transforms, the cost of acquiring and releasing a lock per frame would dominate the actual computation time.
The Right Model: Thread Confinement + Merge
The design that fits this system is:
- DataManager stays single-threaded. All reads and writes happen on the main (Qt) thread. No synchronization added.
- Workers use private data. Background threads work with either independent data (file I/O) or private buffer objects (transform outputs).
- Results merge on the main thread. Workers produce results in private buffers. The main thread absorbs those results into DataManager, either all-at-once (simple) or periodically (for progressive visibility).
- Qt signals provide the communication channel. Cross-thread signals use
Qt::QueuedConnectionautomatically — no manual synchronization needed.
This requires zero synchronization primitives on data objects. Safety comes from: (a) the type system (shared_ptr<T const> prevents mutation), (b) thread isolation (workers write to private buffers), and (c) temporal separation (merges happen on the main thread between event loop iterations).
Ownership, Const, and Sharing
Read-Only vs Mutable Access
Current state: getData<T>(key) returns std::shared_ptr<T> — always mutable. There is no getConstData() API. Consumers who only need to read (~90% of call sites) get full write access.
Why this matters for concurrency: If we know a consumer only reads, we can safely let multiple threads hold shared_ptr<T const> simultaneously without any synchronization. If any consumer might write, we need exclusive access guarantees.
Data Types Have Different Sharing Characteristics
VideoData wraps a stateful FFmpeg decoder. Two threads cannot read frames from the same VideoData — the decoder’s internal seek position, codec context, and frame buffer are mutable and unsynchronized. By contrast, LineData is a structure of arrays: if no one is writing, any number of threads can read it safely.
| Data Type | Storage | Concurrent Read | Shareable via const? |
|---|---|---|---|
LineData, PointData, MaskData |
SoA arrays (RaggedTimeSeries) | Safe if no writer | Yes |
AnalogTimeSeries |
Dense array | Safe if no writer | Yes |
VideoData |
Stateful FFmpeg decoder | Never safe | No — must clone |
TensorData |
Armadillo/LibTorch matrix | Safe if no writer | Yes |
ConcurrencyTraits: Encoding Sharing Properties at Compile Time
template<typename T>
struct ConcurrencyTraits {
/// Whether T supports concurrent reads when no writer is active.
/// Types with mutable internal state (decoders, caches) should be false.
static constexpr bool supports_concurrent_read = true;
/// Whether T can be cheaply cloned for worker-thread use.
/// False for large SoA containers where copying is expensive.
static constexpr bool supports_cheap_clone = false;
};
// VideoData: no concurrent reads, but can be cloned (new decoder, same file)
template<>
struct ConcurrencyTraits<VideoData> {
static constexpr bool supports_concurrent_read = false;
static constexpr bool supports_cheap_clone = true;
};
// LineData: concurrent reads OK, but cloning is expensive (large SoA)
template<>
struct ConcurrencyTraits<LineData> {
static constexpr bool supports_concurrent_read = true;
static constexpr bool supports_cheap_clone = false;
};How traits guide threading strategy:
supports_concurrent_read?
┌──────────┬──────────┐
│ YES │ NO │
┌───────────────┼──────────┼──────────┤
supports │ YES │ Share OR │ Clone │
_cheap_ │ │ Clone │ │
clone? ├───────────────┼──────────┼──────────┤
│ NO │ Share │ Snapshot │
│ │ (const) │ (copy) │
└───────────────┴──────────┴──────────┘
- Share (const): Worker receives
shared_ptr<T const>. Zero cost. Requires that no writer modifies the object while the worker runs. - Clone: Worker creates a new instance from the same source (e.g., new
VideoDatawith the same file path). Moderate cost. - Snapshot (copy): Worker receives a deep copy. Expensive. Last resort — indicates a design problem if encountered.
Staged Design
Stage 1: getConstData<T>() — Read-Only Access
Add a DataManager API that returns shared_ptr<T const>:
template<typename T>
std::shared_ptr<T const> getConstData(std::string const & key) const;What it provides:
- Documents intent. Call sites that only read use
getConstData(). Call sites that mutate usegetData(). The distinction is visible in the code. - Compile-time enforcement. You cannot call
addAtTime()through ashared_ptr<MaskData const>. - Foundation for safe sharing. Multiple threads can hold
shared_ptr<T const>to the same object without synchronization, as long as no thread holds a mutable reference simultaneously. - Gradual adoption. Existing
getData()calls don’t need to change. New code can opt in. Over time,getData()calls can be audited and converted.
Implementation: Trivially wraps getData<T>() — shared_ptr<T> implicitly converts to shared_ptr<T const>. No data objects change.
Caveat — mutable cache in RaggedTimeSeries: RaggedTimeSeries has a mutable _cached_storage field populated lazily on read. Two threads calling read-path methods that populate the cache create a data race on the cache itself. Before shared_ptr<T const> is truly safe for concurrent reads, the cache must be made thread-safe (e.g., populate eagerly on construction, or use std::call_once).
Stage 2: Write Reservations — Background Worker Protocol
A write reservation is a lightweight object that mediates background writes to a DataManager key. It replaces direct getData() + addAtTime() for background workers.
auto reservation = dm->createWriteReservation<MaskData>("output_masks");A reservation holds:
- A private buffer — a new, empty data object (e.g.,
MaskData) owned by the reservation. The worker writes here. - The target key in DataManager where results will be merged.
- A
merge()method that transfers accumulated data from the buffer to the real data object on the main thread. - A state flag indicating a background operation is in progress for this key.
Worker writes to the buffer, not to DataManager:
// Worker thread — never touches DataManager
void WorkerThread::run() {
for (int frame = start; frame <= end; ++frame) {
auto mask = model.infer(getFrame(frame));
_buffer->addAtTime(frame, std::move(mask));
emit progressChanged(frame, total);
}
}Main thread merges periodically:
// Main thread — QTimer fires every ~200ms
void onMergeTimer() {
reservation.mergeInto(*dm);
// Internally:
// 1. Moves accumulated frames from buffer into real MaskData
// 2. Uses addAtTime(..., NotifyObservers::No) for each
// 3. Calls notifyObservers() once
// 4. Clears the buffer
}What this provides:
- Progressive visibility. Every 200ms, newly computed frames appear in the real data object. Observer callbacks fire, the UI redraws, and the user sees results appearing as the worker runs.
- Interactive editing during background work. Between merges, the main thread has full exclusive access to the real data object. The user can edit frame N in Media_Widget normally. The worker writes to a separate buffer. No conflict.
- Zero overhead for fast operations. The reservation system is only used when explicitly created. The 99% case (direct
getData()on main thread) is untouched.
Merge frequency tradeoffs:
| Frequency | Behavior | Use Case |
|---|---|---|
| 200ms timer | 5 merges/second, smooth visual updates | DL inference, long transforms |
| On completion only | All-at-once delivery | Data loading (entire object arrives at once) |
| Worker-driven signal | Worker emits batchReady(N) every N frames |
Flexible, data-rate adaptive |
Conflict resolution: If the user edits frame 150 while the worker also computes frame 150, the merge must decide who wins. Simplest policy: “last writer wins” — the merge overwrites. If user edits should take priority, the merge can skip frames modified since the reservation was created (tracked via a “dirty frame set” in the reservation).
Stage 3: Borrow Tracking — Protocol Enforcement
An optional diagnostic layer that enforces the protocol: “don’t mutate data that has an active write reservation.”
DataManager maintains a per-key state:
enum class BorrowState { Free, ReadBorrowed, WriteReserved };Rules:
getConstData()succeeds if:FreeorReadBorrowed. Transitions toReadBorrowed.getData()succeeds if:Freeonly.createWriteReservation()succeeds if:FreeorReadBorrowed(the reservation writes to a private buffer, not the real object).getData()asserts (debug) if:WriteReserved— catches protocol violations where someone tries to mutate the real object while a background worker is producing data for it.
In debug builds: Violations trigger an assertion with a clear message (“Attempted getData() on key ‘output_masks’ which has an active write reservation”).
In release builds: The tracking can be compiled out entirely (zero cost) or kept as a cheap enum check.
How Threaded Scenarios Map to This Design
Data Loading (Simple: Fork-Join)
Worker reads a file, builds a complete data object, returns it. No DataManager contact during work. Main thread calls setData() once after worker finishes. Uses no reservations or borrows — the data object doesn’t exist in DataManager until the worker is done.
Deep Learning Batch Inference (Merge Pattern)
Worker runs model inference over a frame range, producing mask/point/line data. Worker clones VideoData for independent frame access. Writes to a private buffer (write reservation). Main thread merges periodically for progressive visibility. User can simultaneously interact with the media widget and existing data.
Data Transforms (Fork-Join or Merge)
Short transforms (< 100ms): run synchronously on the main thread, no threading needed.
Long transforms (seconds+): use the merge pattern. Worker computes into a private buffer, main thread merges after completion (or periodically for very long transforms).
Interactive Editing During Background Work
User edits mask at frame 42 while a DL transform populates frames 100-500:
- Worker writes to a private buffer, not the real
MaskData. - User’s edit goes through
getData<MaskData>("masks")→addAtTime(42, ...)on the real object, on the main thread. - Next merge absorbs the worker’s new frames into the real object, alongside the user’s edit.
- No conflict — worker and user operate on different objects.
The Full Progression
Today: getData<T>() → shared_ptr<T> (mutable, untracked)
Stage 1: + getConstData<T>() → shared_ptr<T const> (immutable, untracked)
Most consumers migrate to this.
No threading yet — just better API hygiene.
Stage 2: + createWriteReservation<T>(key) → WriteReservation<T>
Returns a private buffer + merge handle.
Used when spawning background workers.
getConstData() still works during reservation.
getData() flagged in debug builds during reservation.
Stage 3: + Borrow tracking (debug-only)
DataManager tracks per-key BorrowState.
Asserts on protocol violations.
Each stage is independently useful and doesn’t break previous consumers.
Concepts for the Reader
Thread Confinement
Designate that certain objects are only ever accessed from one specific thread. DataManager is main-thread-confined. Workers operate on private data only. Communication happens through Qt signals (automatically queued across thread boundaries) and result harvesting after QThread::finished.
The Fork-Join Pattern
Main thread forks a worker, worker runs independently, main thread joins (via finished signal) to collect results. During the fork phase, main thread and worker share nothing. Used by data loading.
The Merge Pattern
Extension of fork-join with incremental result delivery. Worker produces data continuously into a private buffer. Main thread periodically merges buffer contents into the real data object. Provides progressive visibility and allows interactive editing during background work. Used by DL inference and long transforms.
Value Types vs Reference Types in Concurrency
- Value types (returned by value): inherently safe — each thread has its own copy.
- Reference types (
shared_ptr, raw pointers): create aliasing — two threads withshared_ptr<MaskData>point to the same object. shared_ptr<T const>: shared immutable reference — safe for concurrent reads when no mutable reference exists simultaneously.
const as a Concurrency Contract
When data types are correctly designed — const methods truly only read, mutable is only used for thread-safe caches — then shared_ptr<T const> acts as a contract: “I will not modify this object, and I expect no one else to modify it while I hold this reference.”
The existing RaggedTimeSeries API already respects this: addAtTime() and clearAtTime() are non-const, so they cannot be called through a const reference. See the caveat about mutable caches in Stage 1.
Type Traits as Compile-Time Documentation
ConcurrencyTraits<T>::supports_concurrent_read is a constexpr bool that enables static_assert at API boundaries, if constexpr for compile-time code path selection, and self-documenting code. No runtime overhead.
Design Decisions
- No synchronization primitives on data objects. Safety comes from thread isolation and temporal separation, not locks.
QThread(notstd::jthread) for background work. Consistent with existingMLCoreWidgetpattern; integrates with Qt signal/slot.- Merge, not lock, for progressive results. Avoids mutex contention, allows interactive editing, has zero overhead for fast operations.
- Staged rollout. Each stage is useful alone. No big-bang migration.
- DataManager stays single-threaded indefinitely. Workers return or buffer results; main thread writes.