Sampling¶
Fractional differentiation, sequential bootstrap, concurrency analysis, and sample weights (AFML Ch. 4–5).
sampling ¶
average_uniqueness ¶
Compute average uniqueness of each event (AFML Ch. 4).
Uniqueness at bar j for event i is 1 / (number of concurrent events at j). Average uniqueness is the mean across all bars spanned by the event.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
events
|
list[tuple[int, int]]
|
List of (entry_idx, exit_idx) pairs. |
required |
num_bars
|
int
|
Total number of bars. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Average uniqueness per event. |
balanced_class_weights ¶
Compute balanced class weights inversely proportional to class frequency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
list[int]
|
Label vector (e.g. [-1, 0, 1]). |
required |
Returns:
| Type | Description |
|---|---|
dict[int, float]
|
Mapping from label value to weight. |
compare_bootstraps ¶
Monte Carlo comparison of sequential vs. standard bootstrap uniqueness.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ind_matrix
|
ndarray
|
Indicator matrix (num_bars x n_events). |
required |
num_samples
|
int
|
Samples per trial. |
required |
num_trials
|
int
|
Number of Monte Carlo repetitions. |
required |
seed
|
int
|
Random seed. |
required |
Returns:
| Type | Description |
|---|---|
BootstrapComparison
|
Average uniqueness for sequential and standard methods. |
find_min_d ¶
Find the minimum fractional differentiation order that makes a series stationary.
Performs a grid search over d values, applying FFD and testing stationarity with ADF at each step (AFML Ch. 5).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
ndarray
|
Input time series. |
required |
max_d
|
float
|
Maximum d to test. |
required |
step_size
|
float
|
Increment between d values. |
required |
threshold
|
float
|
Weight truncation threshold for FFD. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Minimum d for stationarity (returns max_d if none found). |
frac_diff_expanding ¶
Expanding-window fractional differentiation.
Uses all available history at each point (no truncation), producing a more accurate but slower computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
ndarray
|
Input time series. |
required |
d
|
float
|
Differentiation order. |
required |
threshold
|
float
|
Minimum weight for inclusion. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Fractionally differenced series. |
frac_diff_ffd ¶
Apply FFD (fixed-width window fractional differentiation) to a series.
Produces a stationary series that retains memory, using a truncated weight kernel (AFML Ch. 5).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
ndarray
|
Input time series (e.g. log prices). |
required |
d
|
float
|
Differentiation order (typically 0.3-0.7). |
required |
threshold
|
float
|
Weight truncation threshold (e.g. 1e-4). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Fractionally differenced series. |
get_indicator_matrix ¶
Build an indicator matrix mapping events to bars.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
events
|
list[tuple[int, int]]
|
List of (entry_idx, exit_idx) pairs. |
required |
num_bars
|
int
|
Total number of bars. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Binary matrix of shape (num_bars, n_events) where entry (t, i) = 1 if event i is active at bar t. |
get_weights ¶
Compute fractional differentiation weights (AFML Ch. 5).
Returns the weight vector for a given differentiation order d.
Weights decay with lag; the series size controls truncation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
float
|
Fractional differentiation order (0 < d < 1 for stationarity). |
required |
size
|
int
|
Number of weights to compute. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Weight vector of length |
get_weights_ffd ¶
Compute FFD (Fixed-width window Fractional Differentiation) weights.
Truncates weights below a threshold to create a fixed-width kernel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
float
|
Fractional differentiation order. |
required |
threshold
|
float
|
Minimum absolute weight to keep (e.g. 1e-4). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Truncated weight vector. |
num_co_events ¶
Count concurrent events at each bar (AFML Ch. 4).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
events
|
list[tuple[int, int]]
|
List of (entry_idx, exit_idx) pairs. |
required |
num_bars
|
int
|
Total number of bars in the series. |
required |
Returns:
| Type | Description |
|---|---|
list[int]
|
Number of active events at each bar index. |
return_attribution_weights ¶
Compute return-attribution sample weights (AFML Ch. 4).
Weights each event proportionally to its return contribution, adjusted for concurrency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
events
|
list[tuple[int, int]]
|
List of (entry_idx, exit_idx) pairs. |
required |
returns
|
ndarray
|
Per-bar return series. |
required |
num_bars
|
int
|
Total number of bars. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Sample weights (one per event). |
seq_bootstrap ¶
Sequential bootstrap with uniqueness-aware sampling (AFML Ch. 4).
Draws samples with probability proportional to their average uniqueness, reducing redundancy from overlapping labels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ind_matrix
|
ndarray
|
Indicator matrix (num_bars x n_events) from |
required |
num_samples
|
int
|
Number of samples to draw. |
required |
seed
|
int
|
Random seed. |
required |
Returns:
| Type | Description |
|---|---|
list[int]
|
Sampled event indices. |
standard_bootstrap ¶
Standard IID bootstrap sampling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_observations
|
int
|
Total number of observations to sample from. |
required |
num_samples
|
int
|
Number of bootstrap samples to draw. |
required |
seed
|
int
|
Random seed. |
required |
Returns:
| Type | Description |
|---|---|
list[int]
|
Sampled indices (with replacement). |
time_decay ¶
Apply time-decay to sample weights (AFML Ch. 4).
Linearly decays weights from 1.0 (most recent) to oldest_weight
(least recent).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
weights
|
ndarray
|
Input weights (typically from return attribution). |
required |
oldest_weight
|
float
|
Weight for the oldest observation. Use 0 for full linear decay, 1 for no decay. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Time-decayed weights. |