Data¶
Bar aggregation (10 bar types), CUSUM filtering, sampling, ETF trick, and PCA weights.
Bar Aggregators¶
All bar aggregators share the same streaming interface:
import pymlfinance as ml
agg = ml.data.TickBarAggregator(bar_size=100)
tick = ml.TickData(timestamp=1000, price=100.0, volume=50.0)
bars = agg.process_tick(tick) # returns list[OhlcvBar]
Available aggregators:
- Standard bars:
TickBarAggregator,VolumeBarAggregator,DollarBarAggregator,TimeBarAggregator - Imbalance bars:
TickImbalanceBarAggregator,VolumeImbalanceBarAggregator,DollarImbalanceBarAggregator - Runs bars:
TickRunsBarAggregator,VolumeRunsBarAggregator,DollarRunsBarAggregator
data ¶
DollarBarAggregator ¶
Aggregate ticks into bars when cumulative dollar volume reaches a threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dollar_threshold
|
float
|
Dollar volume threshold per bar. |
required |
process_tick ¶
DollarImbalanceBarAggregator ¶
Dollar imbalance bars (DIB) — sample when signed dollar volume imbalance exceeds an EWMA threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initial_expected
|
float
|
Initial expected imbalance. |
required |
ewma_span
|
float
|
EWMA decay span for threshold adaptation. |
required |
DollarRunsBarAggregator ¶
Dollar runs bars — sample when dollar volume of the dominant direction exceeds an EWMA threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initial_expected
|
float
|
Initial expected run dollar volume. |
required |
ewma_span
|
float
|
EWMA decay span for threshold adaptation. |
required |
TickBarAggregator ¶
Aggregate ticks into bars with a fixed number of ticks per bar.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bar_size
|
int
|
Number of ticks per bar. |
required |
process_tick ¶
TickImbalanceBarAggregator ¶
Tick imbalance bars (TIB) — sample when tick direction imbalance exceeds an EWMA threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initial_expected
|
float
|
Initial expected imbalance. |
required |
ewma_span
|
float
|
EWMA decay span for threshold adaptation. |
required |
TickRunsBarAggregator ¶
Tick runs bars — sample when the longest run of same-sign ticks exceeds an EWMA threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initial_expected
|
float
|
Initial expected run length. |
required |
ewma_span
|
float
|
EWMA decay span for threshold adaptation. |
required |
TimeBarAggregator ¶
Aggregate ticks into bars at fixed time intervals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
interval_seconds
|
int
|
Bar duration in seconds. |
required |
VolumeBarAggregator ¶
Aggregate ticks into bars when cumulative volume reaches a threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
volume_threshold
|
float
|
Volume threshold per bar. |
required |
process_tick ¶
VolumeImbalanceBarAggregator ¶
Volume imbalance bars (VIB) — sample when signed volume imbalance exceeds an EWMA threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initial_expected
|
float
|
Initial expected imbalance. |
required |
ewma_span
|
float
|
EWMA decay span for threshold adaptation. |
required |
VolumeRunsBarAggregator ¶
Volume runs bars — sample when volume of the dominant direction exceeds an EWMA threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initial_expected
|
float
|
Initial expected run volume. |
required |
ewma_span
|
float
|
EWMA decay span for threshold adaptation. |
required |
cusum_filter ¶
CUSUM event filter for detecting structural shifts (AFML Ch. 2).
Detects indices where the cumulative sum of deviations from the running mean exceeds a symmetric threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
ndarray
|
Input series (e.g. log returns or price differences). |
required |
threshold
|
float
|
Symmetric threshold for positive and negative CUSUM. |
required |
Returns:
| Type | Description |
|---|---|
list[int]
|
Indices where CUSUM events are detected. |
etf_trick ¶
ETF trick for combining multiple product series into a single tradeable index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prices
|
list[list[float]]
|
Per-product price series (products x time steps). |
required |
weights
|
list[list[float]]
|
Per-product allocation weights (products x time steps). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Synthetic ETF price series. |
linspace_sample ¶
Sample n evenly spaced indices in a range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
int
|
Start index (inclusive). |
required |
end
|
int
|
End index (exclusive). |
required |
n
|
int
|
Number of samples. |
required |
Returns:
| Type | Description |
|---|---|
list[int]
|
Evenly spaced indices. |
non_negative_rolled ¶
Build a non-negative rolled price series by adjusting for roll gaps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prices
|
ndarray
|
Raw futures prices. |
required |
roll_dates
|
list[int]
|
Indices where contract rolls occur. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Adjusted non-negative price series. |
pca_weights ¶
PCA-based portfolio weights from a covariance matrix.
Allocates risk proportionally to principal components. Optionally targets a specific risk distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cov_matrix
|
ndarray
|
Covariance matrix (n x n). |
required |
risk_target
|
float
|
Target risk fraction for the first component. If None, uses equal risk allocation across all components. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Portfolio weights (length n, sums to 1). |
roll_gaps ¶
Compute roll gaps for a single-future continuous series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prices
|
ndarray
|
Raw futures prices. |
required |
roll_dates
|
list[int]
|
Indices where contract rolls occur. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Cumulative roll gap adjustments. |
uniform_sample ¶
Sample n random indices from a range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Number of samples. |
required |
total
|
int
|
Upper bound of the range (exclusive). |
required |
seed
|
int
|
Random seed for reproducibility. |
required |
Returns:
| Type | Description |
|---|---|
list[int]
|
Randomly sampled indices (sorted, with replacement). |