Rate Estimation Library — Redesign Proposal
Current Problems
1. RateProfile conflates binning metadata with general output
RateProfile stores bin_start and bin_width, which are binning-specific concepts. A Gaussian kernel estimator evaluates the rate at arbitrary time points — it doesn’t have “bins.” An adaptive method might produce irregularly spaced evaluation points. The output struct shouldn’t encode assumptions about how the rate was computed.
2. No first-class time axis
The consumer always needs a paired (time[], rate[]) for plotting, but RateProfile forces callers to reconstruct the time vector from bin_start + i * bin_width. This is fragile and doesn’t generalize to non-uniform evaluation grids.
3. Normalization is tangled with rate metadata
normalizeRateProfiles() takes RateProfile (which carries num_trials) and produces NormalizedRow (a separate struct with redundant bin_start/bin_width). This means: - Two nearly-identical structs exist (RateProfile → NormalizedRow → HeatmapRowData) - The normalization function can’t be applied to already-normalized data or arbitrary value vectors - Z-score internally calls toCountPerTrial() first — the normalization pipeline has hidden coupling to the raw count representation
4. No path to uncertainty quantification
The current pipeline throws away per-trial information immediately: binningEstimate() sums across all trials into a single histogram. There’s no way to compute SEM, confidence intervals, or bootstrap statistics without re-architecting how estimation works.
5. NormalizedRow duplicates HeatmapRowData
The roadmap acknowledges this: “This struct intentionally mirrors CorePlotting::HeatmapRowData.” Having two identical structs is a dependency-avoidance hack but also a maintenance burden.
6. Scaling enum is heatmap-specific
HeatmapScaling is used by the heatmap widget’s state, but the operations it represents (Hz conversion, z-score, min-max) are generic. A PSTH widget would need a near-identical PSTHNormalization enum as the roadmap already sketches.
Design Goals
- Method-agnostic output: The output of any estimation method should be a simple
(times, values)pair, optionally with uncertainty bands. - Per-trial access: The estimation layer should be able to produce per-trial rate curves (for variability/CI computation) without duplicating the estimation logic.
- Composable normalization: Normalization transforms should operate on any
(times, values)data, not justRateProfile. - Extensible estimation: Adding Gaussian kernel smoothing or adaptive methods should require adding one function, not modifying every downstream struct.
- Uncertainty as a first-class concept: Error bars / confidence bands should be representable in the output without bolting them on later.
Proposed Architecture
Layer Diagram
┌─────────────────────────────────────────────────────────┐
│ Widget Layer │
│ HeatmapOpenGLWidget / PSTHPlotOpenGLWidget │
│ Consumes RateEstimate, feeds mappers/renderers │
└────────────┬────────────────────────┬───────────────────┘
│ │
┌───────▼────────┐ ┌─────────▼──────────┐
│ Normalization │ │ Uncertainty (opt) │
│ normalize() │ │ computeCI() │
│ zScore() │ │ bootstrap() │
│ minMax01() │ │ │
└───────┬────────┘ └─────────┬──────────┘
│ │
┌───────▼────────────────────────▼──────────┐
│ RateEstimate (output) │
│ std::vector<double> times │
│ std::vector<double> values │
│ size_t num_trials │
│ std::optional<PerTrialData> per_trial │
└───────────────────┬───────────────────────┘
│
┌───────────────────▼───────────────────────┐
│ Estimator Functions │
│ estimateBinned(gathered, tf, window, p) │
│ estimateKernel(gathered, tf, window, p) │
│ estimateAdaptive(gathered, tf, window, p) │
└───────────────────┬───────────────────────┘
│
┌───────────────────▼───────────────────────┐
│ UnitGatherContext (unchanged) │
│ key, gathered, time_frame │
└───────────────────────────────────────────┘
Core Output: RateEstimate
/// The universal output of any rate estimation method.
/// Always a paired (times, values) suitable for direct plotting.
struct RateEstimate {
std::vector<double> times; ///< Evaluation time points (relative to alignment)
std::vector<double> values; ///< Estimated values at each time point
/// (raw count, rate, or whatever the estimator produces)
size_t num_trials = 0; ///< Trials that contributed to the estimate
};Why this is better: - times[i], values[i] is immediately plottable — no reconstruction needed. - For binning, times are bin centers (e.g., a 10 ms bin covering [-100, -90) reports times[i] = -95). Consumers that need rectangle geometry (like the heatmap mapper) can reconstruct left edges from the metadata’s sample_spacing. - For kernel smoothing, times would be a regular evaluation grid (e.g., every 1 ms). - For adaptive methods, times could be irregularly spaced. - No bin_width field — that’s a property of the BinningParams, not the output.
Per-Trial Data (for Uncertainty and Z-Score)
/// Optional per-trial breakdown, computed by estimators that support it.
/// Each row is one trial's rate curve, evaluated at the same `times` as the
/// aggregate RateEstimate.
struct PerTrialData {
/// trials × time_points matrix (row-major: per_trial_values[trial][time_idx])
std::vector<std::vector<double>> per_trial_values;
};
/// Extended output when per-trial data is requested.
struct RateEstimateWithTrials {
RateEstimate estimate; ///< Mean/aggregate estimate
PerTrialData trials; ///< Per-trial breakdown
};Key insight: Z-score, SEM, and confidence intervals all require knowing the distribution across trials. Currently, z-score is computed across time bins within a unit (which is a within-unit normalization). If you also want across-trial z-score or CI, you need the per-trial matrix. Having PerTrialData optionally available from the estimator is cleaner than trying to reconstruct it after the fact.
Whether you compute z-scores across time bins (current behavior) or across trials is a separate question — both are valid, and they answer different questions. The current within-unit z-score answers “which time bins have unusually high/low rates for this unit?” An across-trial z-score at each time bin would answer “how many standard deviations is the mean rate at this time bin from the overall mean across trials?” The per-trial data supports both.
Decision: Within-unit z-score only for now. Population-level z-score (across all units) can be added later if needed.
Estimation Methods
Keep the std::variant dispatch — it’s lightweight and appropriate here. But each method should produce the same RateEstimate output type:
// ── Parameter structs ──────────────────────────────────────────
struct BinningParams {
double bin_size = 10.0;
};
struct GaussianKernelParams {
double sigma = 20.0; ///< Kernel bandwidth (same units as window)
double eval_step = 1.0; ///< Spacing of evaluation points
};
struct CausalExponentialParams {
double tau = 50.0; ///< Decay time constant
double eval_step = 1.0;
};
using EstimationParams = std::variant<BinningParams, GaussianKernelParams, CausalExponentialParams>;
// ── Estimation functions ───────────────────────────────────────
/// Single-unit estimation (aggregate across trials)
[[nodiscard]] RateEstimate estimateRate(
GatherResult<DigitalEventSeries> const & gathered,
TimeFrame const * time_frame,
double window_size,
EstimationParams const & params = BinningParams{});
/// Single-unit with per-trial breakdown
[[nodiscard]] RateEstimateWithTrials estimateRateWithTrials(
GatherResult<DigitalEventSeries> const & gathered,
TimeFrame const * time_frame,
double window_size,
EstimationParams const & params = BinningParams{});
/// Multi-unit (heatmap use case)
[[nodiscard]] std::vector<RateEstimate> estimateRates(
std::vector<UnitGatherContext> const & units,
double window_size,
EstimationParams const & params = BinningParams{});For binning, estimateRateWithTrials would return per-trial histograms (each trial’s individual bin counts). For kernel smoothing, each trial’s individual smoothed curve. The aggregate estimate.values is always the trial-averaged result.
Normalization as Composable Transforms
Instead of a monolithic normalizeRateProfiles() that knows about RateProfile internals, normalization functions should operate on std::span<double>:
// ── Normalization primitives (operate on value vectors in-place) ─
/// Convert raw counts to firing rate: values[i] /= (num_trials × bin_width_s)
void toFiringRateHz(std::span<double> values, size_t num_trials,
double sample_spacing, double time_units_per_second);
/// Convert raw counts to count-per-trial: values[i] /= num_trials
void toCountPerTrial(std::span<double> values, size_t num_trials);
/// Per-unit z-score: values[i] = (values[i] - mean) / std
void zScoreNormalize(std::span<double> values);
/// Per-unit min-max to [0, 1]
void minMaxNormalize(std::span<double> values);These are simple, testable, composable functions. A caller can chain them:
auto est = estimateRate(gathered, tf, window_size, BinningParams{.bin_size = 10.0});
toCountPerTrial(est.values, est.num_trials); // raw counts → count/trial
zScoreNormalize(est.values); // count/trial → z-scoreFor the heatmap widget’s combo box, you’d still have a convenience enum + dispatch function, but it would delegate to these primitives:
/// Convenience: apply a named scaling to a RateEstimate in-place
enum class ScalingMode {
FiringRateHz,
ZScore,
Normalized01,
RawCount,
CountPerTrial,
};
/// Apply scaling in-place to a single RateEstimate.
/// sample_spacing is read from estimate.metadata.
void applyScaling(RateEstimate & estimate,
ScalingMode mode,
double time_units_per_second);
/// Apply scaling consistently to aggregate + all per-trial curves.
/// For z-score: uses aggregate mean/std so CI bounds are in the same space.
void applyScaling(RateEstimateWithTrials & data,
ScalingMode mode,
double time_units_per_second);This replaces HeatmapScaling and normalizeRateProfiles() with something reusable by both PSTH and Heatmap.
Uncertainty Quantification
With PerTrialData available, confidence intervals become straightforward:
struct ConfidenceBand {
std::vector<double> lower; ///< Lower bound at each time point
std::vector<double> upper; ///< Upper bound at each time point
};
/// Compute SEM-based confidence band: mean ± k × SEM
[[nodiscard]] ConfidenceBand computeSEM(
RateEstimateWithTrials const & data,
double k = 1.0); ///< k=1.96 for 95% CI
/// Compute percentile-based confidence band
[[nodiscard]] ConfidenceBand computePercentileCI(
RateEstimateWithTrials const & data,
double lower_pct = 2.5, ///< e.g. 2.5 for 95% CI
double upper_pct = 97.5);
/// Bootstrap confidence band (resamples trials)
[[nodiscard]] ConfidenceBand bootstrapCI(
RateEstimateWithTrials const & data,
size_t n_resamples = 1000,
double ci_level = 0.95);Interaction with normalization: The ConfidenceBand is computed on whatever values are in the RateEstimateWithTrials. So the workflow for “z-scored confidence intervals” is:
To avoid inconsistencies, applyScaling() has an overload for RateEstimateWithTrials that applies the same transform to both aggregate and per-trial data in one call:
/// Apply scaling to aggregate + all per-trial curves consistently.
/// For z-score: uses aggregate mean/std to normalize all trials (not per-trial stats).
void applyScaling(RateEstimateWithTrials & data,
ScalingMode mode,
double time_units_per_second);The z-score path uses the aggregate mean and std to normalize each trial, ensuring CI bounds are in the same z-score space as the mean:
auto data = estimateRateWithTrials(gathered, tf, window_size, params);
applyScaling(data, ScalingMode::ZScore, 1000.0); // normalizes aggregate + all trials
auto ci = computeSEM(data); // CI is in z-score space — consistentInternally, applyScaling for RateEstimateWithTrials delegates to a helper:
/// Z-score each trial using statistics from the aggregate estimate
void zScoreTrialsFromAggregate(RateEstimateWithTrials & data);Bridging to Rendering
The conversion from RateEstimate to rendering types remains the widget’s responsibility, but it’s now trivial:
// In HeatmapOpenGLWidget::rebuildScene():
auto estimates = estimateRates(contexts, window_size, params);
// Apply scaling
for (auto & est : estimates) {
applyScaling(est, scaling_mode, time_units_per_second);
}
// Convert to HeatmapRowData for the mapper
// times[] are bin centers; the mapper needs left edges and width.
std::vector<CorePlotting::HeatmapRowData> rows;
for (auto & est : estimates) {
double const spacing = est.metadata.sample_spacing;
double const left_edge = est.times.front() - spacing / 2.0;
rows.push_back({std::move(est.values), left_edge, spacing});
}For the PSTH (which plots bars or lines), the conversion is:
auto est = estimateRate(gathered, tf, window_size, params);
applyScaling(est, mode, tps);
// est.times and est.values are directly plottable as a line or bar chartsample_spacing in Metadata
The Hz conversion (toFiringRateHz) needs to know the sample spacing (bin width for binning, eval_step for kernels) to divide by time. This is stored in RateEstimate::Metadata so the output is self-describing and callers don’t need to thread EstimationParams through the normalization layer:
struct RateEstimate {
std::vector<double> times;
std::vector<double> values;
size_t num_trials = 0;
/// Metadata about how this estimate was produced
struct Metadata {
double sample_spacing = 1.0; ///< Time between evaluation points
/// (bin_size for binning, eval_step for kernels)
};
Metadata metadata;
};The estimator fills metadata.sample_spacing automatically. applyScaling() reads it from the estimate rather than requiring an extra parameter.
Summary of Proposed API
Files:
EventRateEstimation/
├── RateEstimate.hpp # RateEstimate, PerTrialData, RateEstimateWithTrials, ScalingMode
├── RateEstimation.hpp # EstimationParams, estimateRate, estimateRates, estimateRateWithTrials
├── RateNormalization.hpp # toFiringRateHz, toCountPerTrial, zScoreNormalize, minMaxNormalize,
│ # applyScaling (single + with-trials), zScoreTrialsFromAggregate
├── RateUncertainty.hpp # ConfidenceBand, computeSEM, computePercentileCI, bootstrapCI
├── GatherContext.hpp # UnitGatherContext, createUnitGatherContext[s] (unchanged)
└── RateEstimation.cpp # all implementations
Migration Checklist
| Current | Proposed | Notes |
|---|---|---|
RateProfile |
RateEstimate |
times + values instead of counts + bin_start + bin_width |
NormalizedRow |
removed | RateEstimate serves this role directly after in-place normalization |
HeatmapScaling |
ScalingMode |
Renamed, lives in RateEstimate.hpp (not heatmap-specific) |
normalizeRateProfiles() |
applyScaling() (in-place, two overloads) |
Single RateEstimate or RateEstimateWithTrials |
estimateRate() returns RateProfile |
Returns RateEstimate |
Same function signature otherwise |
| (not possible) | estimateRateWithTrials() |
New: enables CI/variability |
| (not possible) | ConfidenceBand |
New: returned by CI functions |
Resolved Design Decisions
Bin centers in
times:times[]always reports bin centers for binned estimation. Consumers that need rectangle geometry (heatmap mapper) reconstruct left edges viatimes[i] - metadata.sample_spacing / 2. Line-plot consumers use the centers directly.Within-unit z-score only: Z-score normalizes each unit independently across its own time bins. Population-level z-score (across all units) is deferred to a future phase if needed.
Opt-in per-trial data:
estimateRateWithTrials()is a separate function. The defaultestimateRate()does not allocate per-trial storage, keeping the common path lightweight.Consistent normalization via
applyScaling(RateEstimateWithTrials &): An overload ofapplyScalinghandles both aggregate and per-trial data in one call. For z-score, it uses the aggregate mean/std to normalize all trials, ensuring CI bounds computed afterward are in the same z-score space as the mean.