Threaded Loading Roadmap
Overview
This document plans the introduction of background-threaded data loading so the UI stays responsive during large file imports.
For the broader concurrency architecture — ownership model, const access, write reservations, and data-sharing semantics — see Concurrency Architecture.
Problem Statement
All data loading is currently synchronous on the main (Qt) thread. When a user clicks “Load” in an import widget, the loader parses the entire file, builds the data object, and writes it to DataManager — all while the UI is frozen.
| Loader | Bottleneck | Typical Freeze Duration |
|---|---|---|
| CSV (large PointData/MaskData) | Line-by-line string parsing | Seconds (millions of rows) |
| Multi-channel binary (64ch analog) | Raw I/O of 500MB+ files | Seconds |
| HDF5 (ragged arrays) | Deserialization overhead | Seconds |
| Video/Image media | Metadata only (lazy decode) | Negligible |
Media loading is already effectively async because VideoData::doLoadMedia() only reads metadata — frames are decoded on demand via doLoadFrame(). The pain is in structured data formats.
Current Loading Flow
Main Thread (blocked)
─────────────────────────────────────────────────────
User clicks "Load"
→ Import widget validates inputs
→ Calls loader directly: IFormatLoader::load(filepath, type, config)
→ Parser reads entire file
→ Builds shared_ptr<DataType> (e.g., PointData, AnalogTimeSeries)
← Returns LoadResult with LoadedDataVariant
→ _data_manager->setData(key, data, TimeKey)
→ Stores in _data map
→ Calls _notifyObservers() (synchronous callbacks)
→ Shows success message box
─────────────────────────────────────────────────────
UI unfreezes
Why This Is a Simple Concurrency Case
Data loading is a pure producer: the worker reads from a file (no DataManager dependency), builds a complete data object, and hands it off. There is no shared mutable state during the work.
Deep Learning: Worker ↔ DataManager (reads frames, writes results)
Data Loading: Worker ← File system only → returns finished data object
The existing IFormatLoader::load() is already a pure function: file path in, LoadResult out. Moving it to a worker thread is mechanical. This uses the fork-join pattern described in the Concurrency Architecture — no write reservations needed because the data object doesn’t exist in DataManager until the worker is done.
Implementation Plan
Phase 1: Loader Progress Infrastructure
Goal: Enable loaders to report progress so users see feedback during loading, even before threading is added.
At this point loading is still synchronous, but users see progress.
Phase 2: Worker Thread for Single-Object Loading
Goal: Move file parsing off the main thread for single-object imports.
Phase 3: Batch Loading and Polish
Goal: Support multi-channel binary imports and general robustness.
Phase 4: Testing and Documentation
Files to Modify
| File | Changes |
|---|---|
IFormatLoader (IO/core/) |
Add LoadProgressCallback parameter to load() |
CSVLoader.cpp |
Implement progress reporting in parse loop |
BinaryFormatLoader.cpp |
Implement progress reporting per channel |
Import widget .cpp files |
DataLoadWorker creation, signal wiring, UI disable/enable |
Files to Create
| File | Purpose |
|---|---|
DataLoadWorker.hpp/.cpp (in import widget area or shared utils) |
QThread subclass for background loading |
Verification
- Build —
cmake --build --preset linux-clang-releasemust succeed - Existing tests — All IO and DataManager tests must pass
- Manual — UI responsiveness — Import a large CSV; verify widgets remain interactive during load
- Manual — cancellation — Start a large import, cancel mid-load; verify UI recovers cleanly
- Manual — correctness — Compare loaded data against synchronous load for identical files
Relationship to Other Roadmaps
Threaded loading uses the fork-join pattern — the simplest concurrency case. The worker reads a file, builds a data object, and returns it. No DataManager contact during work. No write reservations needed.
For the broader concurrency design (write reservations, borrow tracking, progressive merge), see Concurrency Architecture.
For applying the merge pattern to deep learning inference, see Background Inference Roadmap.
| Aspect | Data Loading | Deep Learning Inference |
|---|---|---|
| Worker input | File path + options | Model + cloned MediaData + frame range |
| Worker computation | Parse file → build data object | Encode → forward → decode per frame |
| Worker output | LoadResult (single object) |
BatchInferenceResult (multi-frame) |
| Main-thread write | One setData() call |
Multiple addAtTime() calls |
| Progress source | Bytes read / file size | Frames processed / total frames |
| Cancellation check | Between row batches | Between frames |
| DataManager contact during work | None | None (uses cloned MediaData) |
Threaded loading is a good first implementation to build confidence with the QThread + signal/slot pattern before tackling the more complex inference case.