Threaded Loading Roadmap

Overview

This document plans the introduction of background-threaded data loading so the UI stays responsive during large file imports.

For the broader concurrency architecture — ownership model, const access, write reservations, and data-sharing semantics — see Concurrency Architecture.

Problem Statement

All data loading is currently synchronous on the main (Qt) thread. When a user clicks “Load” in an import widget, the loader parses the entire file, builds the data object, and writes it to DataManager — all while the UI is frozen.

Loader Bottleneck Typical Freeze Duration
CSV (large PointData/MaskData) Line-by-line string parsing Seconds (millions of rows)
Multi-channel binary (64ch analog) Raw I/O of 500MB+ files Seconds
HDF5 (ragged arrays) Deserialization overhead Seconds
Video/Image media Metadata only (lazy decode) Negligible

Media loading is already effectively async because VideoData::doLoadMedia() only reads metadata — frames are decoded on demand via doLoadFrame(). The pain is in structured data formats.

Current Loading Flow

Main Thread (blocked)
─────────────────────────────────────────────────────
User clicks "Load"
  → Import widget validates inputs
  → Calls loader directly: IFormatLoader::load(filepath, type, config)
    → Parser reads entire file
    → Builds shared_ptr<DataType> (e.g., PointData, AnalogTimeSeries)
    ← Returns LoadResult with LoadedDataVariant
  → _data_manager->setData(key, data, TimeKey)
    → Stores in _data map
    → Calls _notifyObservers() (synchronous callbacks)
  → Shows success message box
─────────────────────────────────────────────────────
UI unfreezes

Why This Is a Simple Concurrency Case

Data loading is a pure producer: the worker reads from a file (no DataManager dependency), builds a complete data object, and hands it off. There is no shared mutable state during the work.

Deep Learning:  Worker ↔ DataManager (reads frames, writes results)
Data Loading:   Worker ← File system only → returns finished data object

The existing IFormatLoader::load() is already a pure function: file path in, LoadResult out. Moving it to a worker thread is mechanical. This uses the fork-join pattern described in the Concurrency Architecture — no write reservations needed because the data object doesn’t exist in DataManager until the worker is done.

Implementation Plan

Phase 1: Loader Progress Infrastructure

Goal: Enable loaders to report progress so users see feedback during loading, even before threading is added.

At this point loading is still synchronous, but users see progress.

Phase 2: Worker Thread for Single-Object Loading

Goal: Move file parsing off the main thread for single-object imports.

Phase 3: Batch Loading and Polish

Goal: Support multi-channel binary imports and general robustness.

Phase 4: Testing and Documentation

Files to Modify

File Changes
IFormatLoader (IO/core/) Add LoadProgressCallback parameter to load()
CSVLoader.cpp Implement progress reporting in parse loop
BinaryFormatLoader.cpp Implement progress reporting per channel
Import widget .cpp files DataLoadWorker creation, signal wiring, UI disable/enable

Files to Create

File Purpose
DataLoadWorker.hpp/.cpp (in import widget area or shared utils) QThread subclass for background loading

Verification

  1. Buildcmake --build --preset linux-clang-release must succeed
  2. Existing tests — All IO and DataManager tests must pass
  3. Manual — UI responsiveness — Import a large CSV; verify widgets remain interactive during load
  4. Manual — cancellation — Start a large import, cancel mid-load; verify UI recovers cleanly
  5. Manual — correctness — Compare loaded data against synchronous load for identical files

Relationship to Other Roadmaps

Threaded loading uses the fork-join pattern — the simplest concurrency case. The worker reads a file, builds a data object, and returns it. No DataManager contact during work. No write reservations needed.

For the broader concurrency design (write reservations, borrow tracking, progressive merge), see Concurrency Architecture.

For applying the merge pattern to deep learning inference, see Background Inference Roadmap.

Aspect Data Loading Deep Learning Inference
Worker input File path + options Model + cloned MediaData + frame range
Worker computation Parse file → build data object Encode → forward → decode per frame
Worker output LoadResult (single object) BatchInferenceResult (multi-frame)
Main-thread write One setData() call Multiple addAtTime() calls
Progress source Bytes read / file size Frames processed / total frames
Cancellation check Between row batches Between frames
DataManager contact during work None None (uses cloned MediaData)

Threaded loading is a good first implementation to build confidence with the QThread + signal/slot pattern before tackling the more complex inference case.