Skip to content

Data

Bar aggregation (10 bar types), CUSUM filtering, sampling, ETF trick, and PCA weights.

Bar Aggregators

All bar aggregators share the same streaming interface:

import pymlfinance as ml

agg = ml.data.TickBarAggregator(bar_size=100)
tick = ml.TickData(timestamp=1000, price=100.0, volume=50.0)
bars = agg.process_tick(tick)  # returns list[OhlcvBar]

Available aggregators:

  • Standard bars: TickBarAggregator, VolumeBarAggregator, DollarBarAggregator, TimeBarAggregator
  • Imbalance bars: TickImbalanceBarAggregator, VolumeImbalanceBarAggregator, DollarImbalanceBarAggregator
  • Runs bars: TickRunsBarAggregator, VolumeRunsBarAggregator, DollarRunsBarAggregator

data

DollarBarAggregator

Aggregate ticks into bars when cumulative dollar volume reaches a threshold.

Parameters:

Name Type Description Default
dollar_threshold float

Dollar volume threshold per bar.

required

process_tick

process_tick(tick)

Process a single tick and return a completed bar, if any.

Parameters:

Name Type Description Default
tick TickData

A single market tick.

required

Returns:

Type Description
OhlcvBar or None

A completed bar if the threshold was reached, otherwise None.

process_ticks

process_ticks(ticks)

Process a batch of ticks and return all completed bars.

Parameters:

Name Type Description Default
ticks list[TickData]

Sequence of market ticks.

required

Returns:

Type Description
list[OhlcvBar]

All bars completed during the batch.

DollarImbalanceBarAggregator

Dollar imbalance bars (DIB) — sample when signed dollar volume imbalance exceeds an EWMA threshold.

Parameters:

Name Type Description Default
initial_expected float

Initial expected imbalance.

required
ewma_span float

EWMA decay span for threshold adaptation.

required

process_tick

process_tick(tick)

Process a single tick and return a completed bar, if any.

process_ticks

process_ticks(ticks)

Process a batch of ticks and return all completed bars.

DollarRunsBarAggregator

Dollar runs bars — sample when dollar volume of the dominant direction exceeds an EWMA threshold.

Parameters:

Name Type Description Default
initial_expected float

Initial expected run dollar volume.

required
ewma_span float

EWMA decay span for threshold adaptation.

required

process_tick

process_tick(tick)

Process a single tick and return a completed bar, if any.

process_ticks

process_ticks(ticks)

Process a batch of ticks and return all completed bars.

TickBarAggregator

Aggregate ticks into bars with a fixed number of ticks per bar.

Parameters:

Name Type Description Default
bar_size int

Number of ticks per bar.

required

process_tick

process_tick(tick)

Process a single tick and return a completed bar, if any.

Parameters:

Name Type Description Default
tick TickData

A single market tick.

required

Returns:

Type Description
OhlcvBar or None

A completed bar if the threshold was reached, otherwise None.

process_ticks

process_ticks(ticks)

Process a batch of ticks and return all completed bars.

Parameters:

Name Type Description Default
ticks list[TickData]

Sequence of market ticks.

required

Returns:

Type Description
list[OhlcvBar]

All bars completed during the batch.

TickImbalanceBarAggregator

Tick imbalance bars (TIB) — sample when tick direction imbalance exceeds an EWMA threshold.

Parameters:

Name Type Description Default
initial_expected float

Initial expected imbalance.

required
ewma_span float

EWMA decay span for threshold adaptation.

required

process_tick

process_tick(tick)

Process a single tick and return a completed bar, if any.

process_ticks

process_ticks(ticks)

Process a batch of ticks and return all completed bars.

TickRunsBarAggregator

Tick runs bars — sample when the longest run of same-sign ticks exceeds an EWMA threshold.

Parameters:

Name Type Description Default
initial_expected float

Initial expected run length.

required
ewma_span float

EWMA decay span for threshold adaptation.

required

process_tick

process_tick(tick)

Process a single tick and return a completed bar, if any.

process_ticks

process_ticks(ticks)

Process a batch of ticks and return all completed bars.

TimeBarAggregator

Aggregate ticks into bars at fixed time intervals.

Parameters:

Name Type Description Default
interval_seconds int

Bar duration in seconds.

required

process_tick

process_tick(tick)

Process a single tick and return a completed bar, if any.

process_ticks

process_ticks(ticks)

Process a batch of ticks and return all completed bars.

VolumeBarAggregator

Aggregate ticks into bars when cumulative volume reaches a threshold.

Parameters:

Name Type Description Default
volume_threshold float

Volume threshold per bar.

required

process_tick

process_tick(tick)

Process a single tick and return a completed bar, if any.

Parameters:

Name Type Description Default
tick TickData

A single market tick.

required

Returns:

Type Description
OhlcvBar or None

A completed bar if the threshold was reached, otherwise None.

process_ticks

process_ticks(ticks)

Process a batch of ticks and return all completed bars.

Parameters:

Name Type Description Default
ticks list[TickData]

Sequence of market ticks.

required

Returns:

Type Description
list[OhlcvBar]

All bars completed during the batch.

VolumeImbalanceBarAggregator

Volume imbalance bars (VIB) — sample when signed volume imbalance exceeds an EWMA threshold.

Parameters:

Name Type Description Default
initial_expected float

Initial expected imbalance.

required
ewma_span float

EWMA decay span for threshold adaptation.

required

process_tick

process_tick(tick)

Process a single tick and return a completed bar, if any.

process_ticks

process_ticks(ticks)

Process a batch of ticks and return all completed bars.

VolumeRunsBarAggregator

Volume runs bars — sample when volume of the dominant direction exceeds an EWMA threshold.

Parameters:

Name Type Description Default
initial_expected float

Initial expected run volume.

required
ewma_span float

EWMA decay span for threshold adaptation.

required

process_tick

process_tick(tick)

Process a single tick and return a completed bar, if any.

process_ticks

process_ticks(ticks)

Process a batch of ticks and return all completed bars.

cusum_filter

cusum_filter(values, threshold)

CUSUM event filter for detecting structural shifts (AFML Ch. 2).

Detects indices where the cumulative sum of deviations from the running mean exceeds a symmetric threshold.

Parameters:

Name Type Description Default
values ndarray

Input series (e.g. log returns or price differences).

required
threshold float

Symmetric threshold for positive and negative CUSUM.

required

Returns:

Type Description
list[int]

Indices where CUSUM events are detected.

etf_trick

etf_trick(prices, weights)

ETF trick for combining multiple product series into a single tradeable index.

Parameters:

Name Type Description Default
prices list[list[float]]

Per-product price series (products x time steps).

required
weights list[list[float]]

Per-product allocation weights (products x time steps).

required

Returns:

Type Description
ndarray

Synthetic ETF price series.

linspace_sample

linspace_sample(start, end, n)

Sample n evenly spaced indices in a range.

Parameters:

Name Type Description Default
start int

Start index (inclusive).

required
end int

End index (exclusive).

required
n int

Number of samples.

required

Returns:

Type Description
list[int]

Evenly spaced indices.

non_negative_rolled

non_negative_rolled(prices, roll_dates)

Build a non-negative rolled price series by adjusting for roll gaps.

Parameters:

Name Type Description Default
prices ndarray

Raw futures prices.

required
roll_dates list[int]

Indices where contract rolls occur.

required

Returns:

Type Description
ndarray

Adjusted non-negative price series.

pca_weights

pca_weights(cov_matrix, risk_target=None)

PCA-based portfolio weights from a covariance matrix.

Allocates risk proportionally to principal components. Optionally targets a specific risk distribution.

Parameters:

Name Type Description Default
cov_matrix ndarray

Covariance matrix (n x n).

required
risk_target float

Target risk fraction for the first component. If None, uses equal risk allocation across all components.

None

Returns:

Type Description
ndarray

Portfolio weights (length n, sums to 1).

roll_gaps

roll_gaps(prices, roll_dates)

Compute roll gaps for a single-future continuous series.

Parameters:

Name Type Description Default
prices ndarray

Raw futures prices.

required
roll_dates list[int]

Indices where contract rolls occur.

required

Returns:

Type Description
ndarray

Cumulative roll gap adjustments.

uniform_sample

uniform_sample(n, total, seed)

Sample n random indices from a range.

Parameters:

Name Type Description Default
n int

Number of samples.

required
total int

Upper bound of the range (exclusive).

required
seed int

Random seed for reproducibility.

required

Returns:

Type Description
list[int]

Randomly sampled indices (sorted, with replacement).