Entity Library

Entity Library

The Entity library provides core functionality for entity identification, registration, grouping, and lineage tracking in WhiskerToolbox. It is designed as a DataManager-independent library, enabling reuse across different data management systems.

Architecture Overview

graph TB
    subgraph Entity["Entity Library"]
        ET["EntityTypes.hpp<br/>(EntityId, EntityKind, etc.)"]
        ER["EntityRegistry<br/>(ID allocation)"]
        EGM["EntityGroupManager<br/>(grouping)"]
        
        subgraph Lineage["Lineage Subsystem"]
            LT["LineageTypes.hpp<br/>(variant types)"]
            LReg["LineageRegistry<br/>(metadata storage)"]
            LRes["LineageResolver<br/>(resolution logic)"]
            IDS["IEntityDataSource<br/>(abstract interface)"]
        end
    end
    
    ET --> ER
    ET --> EGM
    LT --> LReg
    IDS --> LRes
    LReg --> LRes

Core Components

EntityTypes

Located in Entity/EntityTypes.hpp, defines the fundamental types:

  • EntityId: Strongly-typed identifier for entities (wraps uint64_t)
  • EntityKind: Enumeration of entity categories (Point, Line, Event, Interval, Mask)
  • EntityDescriptor: Combines data_key, kind, time_value, and local_index
  • EntityTupleKey: Composite key for entity registry lookups
  • DataEntry<T>: Template wrapper pairing data with EntityId
#include "Entity/EntityTypes.hpp"

// EntityId requires explicit construction
EntityId id(42);

// EntityIds are comparable and hashable
if (id == EntityId(42)) { /* ... */ }

// Use in unordered containers (std::hash specialization provided)
std::unordered_set<EntityId> entity_set;
entity_set.insert(id);

// EntityKind enumeration
EntityKind kind = EntityKind::MaskEntity;

EntityRegistry

Thread-safe registry that maps (data_key, kind, time, local_index) tuples to unique EntityIds:

#include "Entity/EntityRegistry.hpp"

EntityRegistry registry;

// Get or create an EntityId for the tuple (idempotent)
EntityId id = registry.ensureId("masks", EntityKind::MaskEntity, 
                                 TimeFrameIndex(10), 0);

// Lookup descriptor for an EntityId
auto desc = registry.get(id);
if (desc) {
    std::cout << "Data key: " << desc->data_key << "\n";
    std::cout << "Time: " << desc->time_value << "\n";
}

// Clear all entities (session reset)
registry.clear();

Thread Safety: EntityRegistry uses std::mutex internally and is safe to use from multiple threads.

EntityGroupManager

Manages user-defined groups of entities with observer notification support:

#include "Entity/EntityGroupManager.hpp"

EntityGroupManager manager;

// Create a group (returns unique GroupId)
GroupId group = manager.createGroup("whisker_1", "First whisker entities");

// Add single entity
manager.addEntityToGroup(group, EntityId(100));

// Add multiple entities efficiently
std::vector<EntityId> ids = {EntityId(101), EntityId(102), EntityId(103)};
std::size_t added = manager.addEntitiesToGroup(group, ids);

// Query group contents
auto members = manager.getEntitiesInGroup(group);
bool in_group = manager.isEntityInGroup(group, EntityId(100));

// Reverse lookup: which groups contain an entity?
auto groups = manager.getGroupsContainingEntity(EntityId(100));

// Get group metadata
auto descriptor = manager.getGroupDescriptor(group);
if (descriptor) {
    std::cout << "Name: " << descriptor->name << "\n";
    std::cout << "Size: " << descriptor->entity_count << "\n";
}

// Observe changes (for UI updates)
manager.getGroupObservers().addObserver([]() {
    // Refresh UI when groups change
});
manager.notifyGroupsChanged();  // Call after bulk updates

Thread Safety: EntityGroupManager is not thread-safe. Callers must synchronize access when using from multiple threads.

Lineage Subsystem

The Lineage subsystem tracks data provenance—how derived data relates to source data. This enables features like:

  • Tracing computed values back to original entities
  • Invalidating derived data when sources change
  • Visualizing data relationships in the UI

Lineage Types

Located in Entity/Lineage/LineageTypes.hpp, defines 8 lineage strategies as a std::variant:

Type Description Use Case
Source Root data with no parent Raw loaded data
OneToOneByTime 1:1 mapping at each time point MaskData → Area values
AllToOneByTime N:1 aggregation at each time Sum of all areas at time T
SubsetLineage Filtered subset of source Excluded/selected entities
MultiSourceLineage Multiple sources combined Merged datasets
ExplicitLineage Manual per-element mapping Custom transformations
EntityMappedLineage EntityId → EntityId mapping Persistent entity relationships
ImplicitEntityMapping Cardinality-based inference Broadcast/reduce operations
#include "Entity/Lineage/LineageTypes.hpp"
using namespace WhiskerToolbox::Entity::Lineage;

// One-to-one relationship: derived[T, i] ← source[T, i]
OneToOneByTime lineage{"source_masks"};

// Aggregation: derived[T] ← ALL source[T, *]
AllToOneByTime agg_lineage{"mask_areas"};

// Subset with specific entities
SubsetLineage subset;
subset.source_key = "masks";
subset.included_entities = {EntityId(100), EntityId(102)};

// Entity-to-entity mapping for entity-bearing derived containers
EntityMappedLineage mapped;
mapped.source_key = "masks";
mapped.entity_mapping = {
    {EntityId(200), {EntityId(100)}},  // Line 200 from Mask 100
    {EntityId(201), {EntityId(100), EntityId(101)}}  // Line 201 from Masks 100 & 101
};

// Type-erased descriptor for storage
Descriptor desc = lineage;

// Query helpers
bool is_source = isSource(desc);  // false
auto keys = getSourceKeys(desc);  // {"source_masks"}
std::string name = getLineageTypeName(desc);  // "OneToOneByTime"

LineageRegistry

Stores lineage metadata for data containers with staleness tracking:

#include "Entity/Lineage/LineageRegistry.hpp"
using namespace WhiskerToolbox::Entity::Lineage;

LineageRegistry registry;

// Register source data (no parent)
registry.setLineage("masks", Source{});

// Register derived data chain
registry.setLineage("mask_areas", OneToOneByTime{"masks"});
registry.setLineage("total_area", AllToOneByTime{"mask_areas"});

// Query lineage
if (registry.hasLineage("mask_areas")) {
    auto lineage = registry.getLineage("mask_areas");
    // Use std::visit to dispatch on lineage type
}

// Get full chain from derived to root
auto chain = registry.getLineageChain("total_area");
// Returns: ["total_area", "mask_areas", "masks"]

// Find dependents (reverse lookup)
auto dependents = registry.getDependentKeys("masks");
// Returns: ["mask_areas"]

// Check if source
bool is_src = registry.isSource("masks");  // true

// Staleness tracking for cache invalidation
registry.markStale("mask_areas");
bool stale = registry.isStale("mask_areas");  // true
registry.propagateStale("masks");  // Marks masks and all dependents stale
registry.markValid("mask_areas");  // After recomputation

Thread Safety: LineageRegistry is not thread-safe. Callers must synchronize access.

LineageResolver

Resolves lineage to find source EntityIds. Uses the abstract IEntityDataSource interface to decouple from specific data storage implementations:

#include "Entity/Lineage/LineageResolver.hpp"
using namespace WhiskerToolbox::Entity::Lineage;

// IEntityDataSource implementation provides data access
// (e.g., DataManagerEntityDataSource in DataManager library)
MyDataSource data_source;
LineageRegistry registry;

// Resolver takes non-owning pointers (caller must ensure lifetime)
LineageResolver resolver(&data_source, &registry);

// Resolve single step: derived → immediate source EntityIds
auto source_ids = resolver.resolveToSource("mask_areas", TimeFrameIndex(10), 0);

// Resolve full chain: derived → root source(s)
// Handles multi-level derivations like: masks → areas → peaks
auto root_ids = resolver.resolveToRoot("total_area", TimeFrameIndex(10));

// For EntityMappedLineage: resolve by derived EntityId
auto parent_ids = resolver.resolveByEntityId("derived_lines", EntityId(200));

// Query helpers
bool has_lin = resolver.hasLineage("mask_areas");  // true
bool is_src = resolver.isSource("masks");  // true
auto chain = resolver.getLineageChain("total_area");
auto all_sources = resolver.getAllSourceEntities("mask_areas");

Resolution Semantics by Lineage Type:

Lineage Type resolveToSource() Returns
Source EntityIds from the container itself
OneToOneByTime EntityId at same (time, index) in source
AllToOneByTime All EntityIds at that time in source
SubsetLineage Filtered EntityIds from source
EntityMappedLineage Use resolveByEntityId() instead

IEntityDataSource Interface

Abstract interface for data source access, enabling lineage resolution without DataManager dependency:

class IEntityDataSource {
public:
    virtual ~IEntityDataSource() = default;
    
    // Get EntityId at specific time and index
    [[nodiscard]] virtual std::vector<EntityId> getEntityIds(
        std::string const& data_key,
        TimeFrameIndex time,
        std::size_t local_index) const = 0;
    
    // Get all EntityIds at a time point
    [[nodiscard]] virtual std::vector<EntityId> getAllEntityIdsAtTime(
        std::string const& data_key,
        TimeFrameIndex time) const = 0;
    
    // Get all EntityIds across all times
    [[nodiscard]] virtual std::unordered_set<EntityId> getAllEntityIds(
        std::string const& data_key) const = 0;
    
    // Get element count at time
    [[nodiscard]] virtual std::size_t getElementCount(
        std::string const& data_key,
        TimeFrameIndex time) const = 0;
};

Integration with DataManager

The DataManager library provides DataManagerEntityDataSource which implements IEntityDataSource, and EntityResolver as a convenience wrapper:

#include "DataManager.hpp"
#include "Lineage/EntityResolver.hpp"

DataManager dm;
// ... add data and register lineage via dm.getLineageRegistry() ...

// EntityResolver composes DataManagerEntityDataSource + LineageResolver internally
EntityResolver resolver(&dm);

// All LineageResolver methods are available
auto source_ids = resolver.resolveToSource("derived_data", TimeFrameIndex(10));
auto root_ids = resolver.resolveToRoot("derived_data", TimeFrameIndex(10));
auto chain = resolver.getLineageChain("derived_data");

The separation allows the Entity library to be tested and used without DataManager dependency.

Dependencies

The Entity library has minimal dependencies:

  • TimeFrame: For TimeFrameIndex type
  • Observer: For EntityGroupManager change notifications

graph LR
    Observer["Observer<br/>(no deps)"]
    TimeFrame["TimeFrame<br/>(no deps)"]
    Entity["Entity"]
    
    Observer --> Entity
    TimeFrame --> Entity

Thread Safety Summary

Component Thread Safety Notes
EntityRegistry ✅ Thread-safe Uses internal mutex
EntityGroupManager ❌ Not thread-safe Caller must synchronize
LineageRegistry ❌ Not thread-safe Caller must synchronize
LineageResolver ✅ Thread-safe* *If data source is thread-safe

Testing

Unit tests are co-located with source files in src/Entity/:

  • EntityRegistry.test.cpp - ID allocation and lookup
  • EntityGroupManager.test.cpp - Group CRUD and queries
  • Lineage/LineageRegistry.test.cpp - Lineage storage and staleness
  • Lineage/LineageResolver.test.cpp - Resolution with mock data source

Integration tests are in tests/DataManager/Lineage/:

  • test_transform_lineage_integration.test.cpp - Full pipeline tests (700+ lines)
  • DataManagerEntityDataSource.test.cpp - Type dispatch tests

Run entity tests:

ctest --preset linux-clang-release -R Entity

Run all lineage-related tests:

ctest --preset linux-clang-release -R "[Ll]ineage"

Migration Guide

From DataManager/Lineage to Entity/Lineage

If you have code using the old locations (pre-January 2026):

// OLD (no longer available)
#include "DataManager/Lineage/LineageTypes.hpp"
#include "DataManager/Lineage/LineageRegistry.hpp"
using namespace WhiskerToolbox::Lineage;

// NEW
#include "Entity/Lineage/LineageTypes.hpp"
#include "Entity/Lineage/LineageRegistry.hpp"
using namespace WhiskerToolbox::Entity::Lineage;

Key API Changes

Old New
WhiskerToolbox::Lineage namespace WhiskerToolbox::Entity::Lineage
Lineage types in DataManager Lineage types in Entity
Direct DataManager coupling Abstract IEntityDataSource interface

EntityResolver Backward Compatibility

The EntityResolver class in DataManager/Lineage/ maintains its public API. Existing code using EntityResolver with DataManager should continue to work without changes.

Example: Custom Data Source

To use lineage resolution with a custom data backend, implement IEntityDataSource:

#include "Entity/Lineage/LineageResolver.hpp"
using namespace WhiskerToolbox::Entity::Lineage;

class MyCustomDataSource : public IEntityDataSource {
public:
    std::vector<EntityId> getEntityIds(
        std::string const& data_key,
        TimeFrameIndex time,
        std::size_t local_index) const override 
    {
        // Return EntityId at specific position
        // Return empty vector if not found
        auto it = _data.find({data_key, time.getValue()});
        if (it == _data.end() || local_index >= it->second.size()) {
            return {};
        }
        return {it->second[local_index]};
    }
    
    std::vector<EntityId> getAllEntityIdsAtTime(
        std::string const& data_key,
        TimeFrameIndex time) const override 
    {
        auto it = _data.find({data_key, time.getValue()});
        if (it == _data.end()) return {};
        return it->second;
    }
    
    std::unordered_set<EntityId> getAllEntityIds(
        std::string const& data_key) const override 
    {
        std::unordered_set<EntityId> result;
        for (auto const& [key, ids] : _data) {
            if (key.first == data_key) {
                result.insert(ids.begin(), ids.end());
            }
        }
        return result;
    }
    
    std::size_t getElementCount(
        std::string const& data_key,
        TimeFrameIndex time) const override 
    {
        auto it = _data.find({data_key, time.getValue()});
        return it != _data.end() ? it->second.size() : 0;
    }

private:
    // Example storage: (data_key, time) → vector of EntityIds
    std::map<std::pair<std::string, int64_t>, std::vector<EntityId>> _data;
};

// Use with LineageResolver
MyCustomDataSource data_source;
LineageRegistry registry;
LineageResolver resolver(&data_source, &registry);

// Now resolution works with your custom backend
auto ids = resolver.resolveToSource("derived", TimeFrameIndex(10));

Performance Considerations

  • EntityRegistry: O(1) lookup and insertion via hash maps
  • EntityGroupManager: O(1) for membership queries; O(n) for batch operations
  • LineageRegistry: O(1) for single-key queries; O(n) for chain traversal
  • LineageResolver: Resolution cost depends on lineage depth and data source implementation; resolveAllToOneToRoot is O(n × depth) for n elements at a time point

See Also