Threaded Loading Roadmap

Overview

This document plans the introduction of background-threaded data loading so the UI stays responsive during large file imports.

For the broader concurrency architecture — ownership model, const access, write reservations, and data-sharing semantics — see Concurrency Architecture.

Problem Statement

All data loading is currently synchronous on the main (Qt) thread. When a user clicks “Load” in an import widget, the loader parses the entire file, builds the data object, and writes it to DataManager — all while the UI is frozen.

Loader	Bottleneck	Typical Freeze Duration
CSV (large PointData/MaskData)	Line-by-line string parsing	Seconds (millions of rows)
Multi-channel binary (64ch analog)	Raw I/O of 500MB+ files	Seconds
HDF5 (ragged arrays)	Deserialization overhead	Seconds
Video/Image media	Metadata only (lazy decode)	Negligible

Media loading is already effectively async because VideoData::doLoadMedia() only reads metadata — frames are decoded on demand via doLoadFrame(). The pain is in structured data formats.

Current Loading Flow

Main Thread (blocked)
─────────────────────────────────────────────────────
User clicks "Load"
  → Import widget validates inputs
  → Calls loader directly: IFormatLoader::load(filepath, type, config)
    → Parser reads entire file
    → Builds shared_ptr<DataType> (e.g., PointData, AnalogTimeSeries)
    ← Returns LoadResult with LoadedDataVariant
  → _data_manager->setData(key, data, TimeKey)
    → Stores in _data map
    → Calls _notifyObservers() (synchronous callbacks)
  → Shows success message box
─────────────────────────────────────────────────────
UI unfreezes

Why This Is a Simple Concurrency Case

Data loading is a pure producer: the worker reads from a file (no DataManager dependency), builds a complete data object, and hands it off. There is no shared mutable state during the work.

Deep Learning:  Worker ↔ DataManager (reads frames, writes results)
Data Loading:   Worker ← File system only → returns finished data object

The existing IFormatLoader::load() is already a pure function: file path in, LoadResult out. Moving it to a worker thread is mechanical. This uses the fork-join pattern described in the Concurrency Architecture — no write reservations needed because the data object doesn’t exist in DataManager until the worker is done.

Files to Modify

File	Changes
`IFormatLoader` (IO/core/)	Add `LoadProgressCallback` parameter to `load()`
`CSVLoader.cpp`	Implement progress reporting in parse loop
`BinaryFormatLoader.cpp`	Implement progress reporting per channel
Import widget `.cpp` files	`DataLoadWorker` creation, signal wiring, UI disable/enable

Files to Create

File	Purpose
`DataLoadWorker.hpp/.cpp` (in import widget area or shared utils)	QThread subclass for background loading

Verification

Build — cmake --build --preset linux-clang-release must succeed
Existing tests — All IO and DataManager tests must pass
Manual — UI responsiveness — Import a large CSV; verify widgets remain interactive during load
Manual — cancellation — Start a large import, cancel mid-load; verify UI recovers cleanly
Manual — correctness — Compare loaded data against synchronous load for identical files

Relationship to Other Roadmaps

Threaded loading uses the fork-join pattern — the simplest concurrency case. The worker reads a file, builds a data object, and returns it. No DataManager contact during work. No write reservations needed.

For the broader concurrency design (write reservations, borrow tracking, progressive merge), see Concurrency Architecture.

For applying the merge pattern to deep learning inference, see Background Inference Roadmap.

Aspect	Data Loading	Deep Learning Inference
Worker input	File path + options	Model + cloned MediaData + frame range
Worker computation	Parse file → build data object	Encode → forward → decode per frame
Worker output	`LoadResult` (single object)	`BatchInferenceResult` (multi-frame)
Main-thread write	One `setData()` call	Multiple `addAtTime()` calls
Progress source	Bytes read / file size	Frames processed / total frames
Cancellation check	Between row batches	Between frames
DataManager contact during work	None	None (uses cloned MediaData)

Threaded loading is a good first implementation to build confidence with the QThread + signal/slot pattern before tackling the more complex inference case.

Threaded Loading Roadmap

Overview

Problem Statement

Current Loading Flow

Why This Is a Simple Concurrency Case

Implementation Plan

Phase 1: Loader Progress Infrastructure

Phase 2: Worker Thread for Single-Object Loading

Phase 3: Batch Loading and Polish

Phase 4: Testing and Documentation

Files to Modify

Files to Create

Verification

Relationship to Other Roadmaps