Skip to content

Sampling

Fractional differentiation, sequential bootstrap, concurrency analysis, and sample weights (AFML Ch. 4–5).

sampling

average_uniqueness

average_uniqueness(events, num_bars)

Compute average uniqueness of each event (AFML Ch. 4).

Uniqueness at bar j for event i is 1 / (number of concurrent events at j). Average uniqueness is the mean across all bars spanned by the event.

Parameters:

Name Type Description Default
events list[tuple[int, int]]

List of (entry_idx, exit_idx) pairs.

required
num_bars int

Total number of bars.

required

Returns:

Type Description
ndarray

Average uniqueness per event.

balanced_class_weights

balanced_class_weights(labels)

Compute balanced class weights inversely proportional to class frequency.

Parameters:

Name Type Description Default
labels list[int]

Label vector (e.g. [-1, 0, 1]).

required

Returns:

Type Description
dict[int, float]

Mapping from label value to weight.

compare_bootstraps

compare_bootstraps(ind_matrix, num_samples, num_trials, seed)

Monte Carlo comparison of sequential vs. standard bootstrap uniqueness.

Parameters:

Name Type Description Default
ind_matrix ndarray

Indicator matrix (num_bars x n_events).

required
num_samples int

Samples per trial.

required
num_trials int

Number of Monte Carlo repetitions.

required
seed int

Random seed.

required

Returns:

Type Description
BootstrapComparison

Average uniqueness for sequential and standard methods.

find_min_d

find_min_d(series, max_d, step_size, threshold)

Find the minimum fractional differentiation order that makes a series stationary.

Performs a grid search over d values, applying FFD and testing stationarity with ADF at each step (AFML Ch. 5).

Parameters:

Name Type Description Default
series ndarray

Input time series.

required
max_d float

Maximum d to test.

required
step_size float

Increment between d values.

required
threshold float

Weight truncation threshold for FFD.

required

Returns:

Type Description
float

Minimum d for stationarity (returns max_d if none found).

frac_diff_expanding

frac_diff_expanding(series, d, threshold)

Expanding-window fractional differentiation.

Uses all available history at each point (no truncation), producing a more accurate but slower computation.

Parameters:

Name Type Description Default
series ndarray

Input time series.

required
d float

Differentiation order.

required
threshold float

Minimum weight for inclusion.

required

Returns:

Type Description
ndarray

Fractionally differenced series.

frac_diff_ffd

frac_diff_ffd(series, d, threshold)

Apply FFD (fixed-width window fractional differentiation) to a series.

Produces a stationary series that retains memory, using a truncated weight kernel (AFML Ch. 5).

Parameters:

Name Type Description Default
series ndarray

Input time series (e.g. log prices).

required
d float

Differentiation order (typically 0.3-0.7).

required
threshold float

Weight truncation threshold (e.g. 1e-4).

required

Returns:

Type Description
ndarray

Fractionally differenced series.

get_indicator_matrix

get_indicator_matrix(events, num_bars)

Build an indicator matrix mapping events to bars.

Parameters:

Name Type Description Default
events list[tuple[int, int]]

List of (entry_idx, exit_idx) pairs.

required
num_bars int

Total number of bars.

required

Returns:

Type Description
ndarray

Binary matrix of shape (num_bars, n_events) where entry (t, i) = 1 if event i is active at bar t.

get_weights

get_weights(d, size)

Compute fractional differentiation weights (AFML Ch. 5).

Returns the weight vector for a given differentiation order d. Weights decay with lag; the series size controls truncation.

Parameters:

Name Type Description Default
d float

Fractional differentiation order (0 < d < 1 for stationarity).

required
size int

Number of weights to compute.

required

Returns:

Type Description
ndarray

Weight vector of length size.

get_weights_ffd

get_weights_ffd(d, threshold)

Compute FFD (Fixed-width window Fractional Differentiation) weights.

Truncates weights below a threshold to create a fixed-width kernel.

Parameters:

Name Type Description Default
d float

Fractional differentiation order.

required
threshold float

Minimum absolute weight to keep (e.g. 1e-4).

required

Returns:

Type Description
ndarray

Truncated weight vector.

num_co_events

num_co_events(events, num_bars)

Count concurrent events at each bar (AFML Ch. 4).

Parameters:

Name Type Description Default
events list[tuple[int, int]]

List of (entry_idx, exit_idx) pairs.

required
num_bars int

Total number of bars in the series.

required

Returns:

Type Description
list[int]

Number of active events at each bar index.

return_attribution_weights

return_attribution_weights(events, returns, num_bars)

Compute return-attribution sample weights (AFML Ch. 4).

Weights each event proportionally to its return contribution, adjusted for concurrency.

Parameters:

Name Type Description Default
events list[tuple[int, int]]

List of (entry_idx, exit_idx) pairs.

required
returns ndarray

Per-bar return series.

required
num_bars int

Total number of bars.

required

Returns:

Type Description
ndarray

Sample weights (one per event).

seq_bootstrap

seq_bootstrap(ind_matrix, num_samples, seed)

Sequential bootstrap with uniqueness-aware sampling (AFML Ch. 4).

Draws samples with probability proportional to their average uniqueness, reducing redundancy from overlapping labels.

Parameters:

Name Type Description Default
ind_matrix ndarray

Indicator matrix (num_bars x n_events) from get_indicator_matrix.

required
num_samples int

Number of samples to draw.

required
seed int

Random seed.

required

Returns:

Type Description
list[int]

Sampled event indices.

standard_bootstrap

standard_bootstrap(num_observations, num_samples, seed)

Standard IID bootstrap sampling.

Parameters:

Name Type Description Default
num_observations int

Total number of observations to sample from.

required
num_samples int

Number of bootstrap samples to draw.

required
seed int

Random seed.

required

Returns:

Type Description
list[int]

Sampled indices (with replacement).

time_decay

time_decay(weights, oldest_weight)

Apply time-decay to sample weights (AFML Ch. 4).

Linearly decays weights from 1.0 (most recent) to oldest_weight (least recent).

Parameters:

Name Type Description Default
weights ndarray

Input weights (typically from return attribution).

required
oldest_weight float

Weight for the oldest observation. Use 0 for full linear decay, 1 for no decay.

required

Returns:

Type Description
ndarray

Time-decayed weights.