graph TB
subgraph Entity["Entity Library"]
ET["EntityTypes.hpp<br/>(EntityId, EntityKind, etc.)"]
ER["EntityRegistry<br/>(ID allocation)"]
EGM["EntityGroupManager<br/>(grouping)"]
subgraph Lineage["Lineage Subsystem"]
LT["LineageTypes.hpp<br/>(variant types)"]
LReg["LineageRegistry<br/>(metadata storage)"]
LRes["LineageResolver<br/>(resolution logic)"]
IDS["IEntityDataSource<br/>(abstract interface)"]
end
end
ET --> ER
ET --> EGM
LT --> LReg
IDS --> LRes
LReg --> LRes
Entity Library
Entity Library
The Entity library provides core functionality for entity identification, registration, grouping, and lineage tracking in WhiskerToolbox. It is designed as a DataManager-independent library, enabling reuse across different data management systems.
Architecture Overview
Core Components
EntityTypes
Located in Entity/EntityTypes.hpp, defines the fundamental types:
EntityId: Strongly-typed identifier for entities (wrapsuint64_t)EntityKind: Enumeration of entity categories (Point, Line, Event, Interval, Mask)EntityDescriptor: Combines data_key, kind, time_value, and local_indexEntityTupleKey: Composite key for entity registry lookupsDataEntry<T>: Template wrapper pairing data with EntityId
#include "Entity/EntityTypes.hpp"
// EntityId requires explicit construction
EntityId id(42);
// EntityIds are comparable and hashable
if (id == EntityId(42)) { /* ... */ }
// Use in unordered containers (std::hash specialization provided)
std::unordered_set<EntityId> entity_set;
entity_set.insert(id);
// EntityKind enumeration
EntityKind kind = EntityKind::MaskEntity;EntityRegistry
Thread-safe registry that maps (data_key, kind, time, local_index) tuples to unique EntityIds:
#include "Entity/EntityRegistry.hpp"
EntityRegistry registry;
// Get or create an EntityId for the tuple (idempotent)
EntityId id = registry.ensureId("masks", EntityKind::MaskEntity,
TimeFrameIndex(10), 0);
// Lookup descriptor for an EntityId
auto desc = registry.get(id);
if (desc) {
std::cout << "Data key: " << desc->data_key << "\n";
std::cout << "Time: " << desc->time_value << "\n";
}
// Clear all entities (session reset)
registry.clear();Thread Safety: EntityRegistry uses std::mutex internally and is safe to use from multiple threads.
EntityGroupManager
Manages user-defined groups of entities with observer notification support:
#include "Entity/EntityGroupManager.hpp"
EntityGroupManager manager;
// Create a group (returns unique GroupId)
GroupId group = manager.createGroup("whisker_1", "First whisker entities");
// Add single entity
manager.addEntityToGroup(group, EntityId(100));
// Add multiple entities efficiently
std::vector<EntityId> ids = {EntityId(101), EntityId(102), EntityId(103)};
std::size_t added = manager.addEntitiesToGroup(group, ids);
// Query group contents
auto members = manager.getEntitiesInGroup(group);
bool in_group = manager.isEntityInGroup(group, EntityId(100));
// Reverse lookup: which groups contain an entity?
auto groups = manager.getGroupsContainingEntity(EntityId(100));
// Get group metadata
auto descriptor = manager.getGroupDescriptor(group);
if (descriptor) {
std::cout << "Name: " << descriptor->name << "\n";
std::cout << "Size: " << descriptor->entity_count << "\n";
}
// Observe changes (for UI updates)
manager.getGroupObservers().addObserver([]() {
// Refresh UI when groups change
});
manager.notifyGroupsChanged(); // Call after bulk updatesThread Safety: EntityGroupManager is not thread-safe. Callers must synchronize access when using from multiple threads.
Lineage Subsystem
The Lineage subsystem tracks data provenance—how derived data relates to source data. This enables features like:
- Tracing computed values back to original entities
- Invalidating derived data when sources change
- Visualizing data relationships in the UI
Lineage Types
Located in Entity/Lineage/LineageTypes.hpp, defines 8 lineage strategies as a std::variant:
| Type | Description | Use Case |
|---|---|---|
Source |
Root data with no parent | Raw loaded data |
OneToOneByTime |
1:1 mapping at each time point | MaskData → Area values |
AllToOneByTime |
N:1 aggregation at each time | Sum of all areas at time T |
SubsetLineage |
Filtered subset of source | Excluded/selected entities |
MultiSourceLineage |
Multiple sources combined | Merged datasets |
ExplicitLineage |
Manual per-element mapping | Custom transformations |
EntityMappedLineage |
EntityId → EntityId mapping | Persistent entity relationships |
ImplicitEntityMapping |
Cardinality-based inference | Broadcast/reduce operations |
#include "Entity/Lineage/LineageTypes.hpp"
using namespace WhiskerToolbox::Entity::Lineage;
// One-to-one relationship: derived[T, i] ← source[T, i]
OneToOneByTime lineage{"source_masks"};
// Aggregation: derived[T] ← ALL source[T, *]
AllToOneByTime agg_lineage{"mask_areas"};
// Subset with specific entities
SubsetLineage subset;
subset.source_key = "masks";
subset.included_entities = {EntityId(100), EntityId(102)};
// Entity-to-entity mapping for entity-bearing derived containers
EntityMappedLineage mapped;
mapped.source_key = "masks";
mapped.entity_mapping = {
{EntityId(200), {EntityId(100)}}, // Line 200 from Mask 100
{EntityId(201), {EntityId(100), EntityId(101)}} // Line 201 from Masks 100 & 101
};
// Type-erased descriptor for storage
Descriptor desc = lineage;
// Query helpers
bool is_source = isSource(desc); // false
auto keys = getSourceKeys(desc); // {"source_masks"}
std::string name = getLineageTypeName(desc); // "OneToOneByTime"LineageRegistry
Stores lineage metadata for data containers with staleness tracking:
#include "Entity/Lineage/LineageRegistry.hpp"
using namespace WhiskerToolbox::Entity::Lineage;
LineageRegistry registry;
// Register source data (no parent)
registry.setLineage("masks", Source{});
// Register derived data chain
registry.setLineage("mask_areas", OneToOneByTime{"masks"});
registry.setLineage("total_area", AllToOneByTime{"mask_areas"});
// Query lineage
if (registry.hasLineage("mask_areas")) {
auto lineage = registry.getLineage("mask_areas");
// Use std::visit to dispatch on lineage type
}
// Get full chain from derived to root
auto chain = registry.getLineageChain("total_area");
// Returns: ["total_area", "mask_areas", "masks"]
// Find dependents (reverse lookup)
auto dependents = registry.getDependentKeys("masks");
// Returns: ["mask_areas"]
// Check if source
bool is_src = registry.isSource("masks"); // true
// Staleness tracking for cache invalidation
registry.markStale("mask_areas");
bool stale = registry.isStale("mask_areas"); // true
registry.propagateStale("masks"); // Marks masks and all dependents stale
registry.markValid("mask_areas"); // After recomputationThread Safety: LineageRegistry is not thread-safe. Callers must synchronize access.
LineageResolver
Resolves lineage to find source EntityIds. Uses the abstract IEntityDataSource interface to decouple from specific data storage implementations:
#include "Entity/Lineage/LineageResolver.hpp"
using namespace WhiskerToolbox::Entity::Lineage;
// IEntityDataSource implementation provides data access
// (e.g., DataManagerEntityDataSource in DataManager library)
MyDataSource data_source;
LineageRegistry registry;
// Resolver takes non-owning pointers (caller must ensure lifetime)
LineageResolver resolver(&data_source, ®istry);
// Resolve single step: derived → immediate source EntityIds
auto source_ids = resolver.resolveToSource("mask_areas", TimeFrameIndex(10), 0);
// Resolve full chain: derived → root source(s)
// Handles multi-level derivations like: masks → areas → peaks
auto root_ids = resolver.resolveToRoot("total_area", TimeFrameIndex(10));
// For EntityMappedLineage: resolve by derived EntityId
auto parent_ids = resolver.resolveByEntityId("derived_lines", EntityId(200));
// Query helpers
bool has_lin = resolver.hasLineage("mask_areas"); // true
bool is_src = resolver.isSource("masks"); // true
auto chain = resolver.getLineageChain("total_area");
auto all_sources = resolver.getAllSourceEntities("mask_areas");Resolution Semantics by Lineage Type:
| Lineage Type | resolveToSource() Returns |
|---|---|
Source |
EntityIds from the container itself |
OneToOneByTime |
EntityId at same (time, index) in source |
AllToOneByTime |
All EntityIds at that time in source |
SubsetLineage |
Filtered EntityIds from source |
EntityMappedLineage |
Use resolveByEntityId() instead |
IEntityDataSource Interface
Abstract interface for data source access, enabling lineage resolution without DataManager dependency:
class IEntityDataSource {
public:
virtual ~IEntityDataSource() = default;
// Get EntityId at specific time and index
[[nodiscard]] virtual std::vector<EntityId> getEntityIds(
std::string const& data_key,
TimeFrameIndex time,
std::size_t local_index) const = 0;
// Get all EntityIds at a time point
[[nodiscard]] virtual std::vector<EntityId> getAllEntityIdsAtTime(
std::string const& data_key,
TimeFrameIndex time) const = 0;
// Get all EntityIds across all times
[[nodiscard]] virtual std::unordered_set<EntityId> getAllEntityIds(
std::string const& data_key) const = 0;
// Get element count at time
[[nodiscard]] virtual std::size_t getElementCount(
std::string const& data_key,
TimeFrameIndex time) const = 0;
};Integration with DataManager
The DataManager library provides DataManagerEntityDataSource which implements IEntityDataSource, and EntityResolver as a convenience wrapper:
#include "DataManager.hpp"
#include "Lineage/EntityResolver.hpp"
DataManager dm;
// ... add data and register lineage via dm.getLineageRegistry() ...
// EntityResolver composes DataManagerEntityDataSource + LineageResolver internally
EntityResolver resolver(&dm);
// All LineageResolver methods are available
auto source_ids = resolver.resolveToSource("derived_data", TimeFrameIndex(10));
auto root_ids = resolver.resolveToRoot("derived_data", TimeFrameIndex(10));
auto chain = resolver.getLineageChain("derived_data");The separation allows the Entity library to be tested and used without DataManager dependency.
Dependencies
The Entity library has minimal dependencies:
- TimeFrame: For
TimeFrameIndextype - Observer: For
EntityGroupManagerchange notifications
graph LR
Observer["Observer<br/>(no deps)"]
TimeFrame["TimeFrame<br/>(no deps)"]
Entity["Entity"]
Observer --> Entity
TimeFrame --> Entity
Thread Safety Summary
| Component | Thread Safety | Notes |
|---|---|---|
EntityRegistry |
✅ Thread-safe | Uses internal mutex |
EntityGroupManager |
❌ Not thread-safe | Caller must synchronize |
LineageRegistry |
❌ Not thread-safe | Caller must synchronize |
LineageResolver |
✅ Thread-safe* | *If data source is thread-safe |
Testing
Unit tests are co-located with source files in src/Entity/:
EntityRegistry.test.cpp- ID allocation and lookupEntityGroupManager.test.cpp- Group CRUD and queriesLineage/LineageRegistry.test.cpp- Lineage storage and stalenessLineage/LineageResolver.test.cpp- Resolution with mock data source
Integration tests are in tests/DataManager/Lineage/:
test_transform_lineage_integration.test.cpp- Full pipeline tests (700+ lines)DataManagerEntityDataSource.test.cpp- Type dispatch tests
Run entity tests:
ctest --preset linux-clang-release -R EntityRun all lineage-related tests:
ctest --preset linux-clang-release -R "[Ll]ineage"Migration Guide
From DataManager/Lineage to Entity/Lineage
If you have code using the old locations (pre-January 2026):
// OLD (no longer available)
#include "DataManager/Lineage/LineageTypes.hpp"
#include "DataManager/Lineage/LineageRegistry.hpp"
using namespace WhiskerToolbox::Lineage;
// NEW
#include "Entity/Lineage/LineageTypes.hpp"
#include "Entity/Lineage/LineageRegistry.hpp"
using namespace WhiskerToolbox::Entity::Lineage;Key API Changes
| Old | New |
|---|---|
WhiskerToolbox::Lineage namespace |
WhiskerToolbox::Entity::Lineage |
| Lineage types in DataManager | Lineage types in Entity |
| Direct DataManager coupling | Abstract IEntityDataSource interface |
EntityResolver Backward Compatibility
The EntityResolver class in DataManager/Lineage/ maintains its public API. Existing code using EntityResolver with DataManager should continue to work without changes.
Example: Custom Data Source
To use lineage resolution with a custom data backend, implement IEntityDataSource:
#include "Entity/Lineage/LineageResolver.hpp"
using namespace WhiskerToolbox::Entity::Lineage;
class MyCustomDataSource : public IEntityDataSource {
public:
std::vector<EntityId> getEntityIds(
std::string const& data_key,
TimeFrameIndex time,
std::size_t local_index) const override
{
// Return EntityId at specific position
// Return empty vector if not found
auto it = _data.find({data_key, time.getValue()});
if (it == _data.end() || local_index >= it->second.size()) {
return {};
}
return {it->second[local_index]};
}
std::vector<EntityId> getAllEntityIdsAtTime(
std::string const& data_key,
TimeFrameIndex time) const override
{
auto it = _data.find({data_key, time.getValue()});
if (it == _data.end()) return {};
return it->second;
}
std::unordered_set<EntityId> getAllEntityIds(
std::string const& data_key) const override
{
std::unordered_set<EntityId> result;
for (auto const& [key, ids] : _data) {
if (key.first == data_key) {
result.insert(ids.begin(), ids.end());
}
}
return result;
}
std::size_t getElementCount(
std::string const& data_key,
TimeFrameIndex time) const override
{
auto it = _data.find({data_key, time.getValue()});
return it != _data.end() ? it->second.size() : 0;
}
private:
// Example storage: (data_key, time) → vector of EntityIds
std::map<std::pair<std::string, int64_t>, std::vector<EntityId>> _data;
};
// Use with LineageResolver
MyCustomDataSource data_source;
LineageRegistry registry;
LineageResolver resolver(&data_source, ®istry);
// Now resolution works with your custom backend
auto ids = resolver.resolveToSource("derived", TimeFrameIndex(10));Performance Considerations
- EntityRegistry: O(1) lookup and insertion via hash maps
- EntityGroupManager: O(1) for membership queries; O(n) for batch operations
- LineageRegistry: O(1) for single-key queries; O(n) for chain traversal
- LineageResolver: Resolution cost depends on lineage depth and data source implementation;
resolveAllToOneToRootis O(n × depth) for n elements at a time point
See Also
- Transform Pipeline - Data transformation system
- DataManager - Central data storage