Features¶
Feature engineering: structural breaks, entropy, microstructure, RMT denoising, portfolio allocation, clustering, and codependence measures.
Structural Breaks¶
SADF, GSADF, and CUSUM tests for detecting regime changes (AFML Ch. 17).
Entropy¶
Shannon, Lempel-Ziv, Kontoyiannis, and Gaussian entropy estimators (AFML Ch. 18).
Microstructure¶
Market microstructure features: Amihud lambda, Kyle lambda, Hasbrouck lambda, Roll spread, Corwin-Schultz spread, VPIN (AFML Ch. 19).
Denoising¶
Random Matrix Theory (RMT) denoising and detoning of correlation/covariance matrices.
Allocation¶
Portfolio allocation: HRP, CLA (min-variance and max-Sharpe), and Inverse Variance (AFML Ch. 16).
Clustering¶
K-means clustering and Optimal Number of Clusters (ONC) algorithm (AFML Ch. 16).
Codependence¶
Pairwise dependence measures: Spearman, distance correlation, mutual information, variation of information, optimal transport, angular/GPR/GNPR distances.
features ¶
adf_test ¶
Augmented Dickey-Fuller unit root test.
Tests whether a time series is stationary by fitting an autoregressive model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
ndarray
|
Time series to test. |
required |
max_lags
|
int
|
Maximum number of autoregressive lags. |
required |
Returns:
| Type | Description |
|---|---|
tuple[float, ndarray]
|
(adf_statistic, regression_coefficients). |
amihud_lambda ¶
Amihud illiquidity measure (AFML Ch. 19).
Measures price impact as the average ratio of absolute return to dollar volume.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
returns
|
ndarray
|
Return series. |
required |
dollar_volumes
|
ndarray
|
Dollar volume series (same length as returns). |
required |
Returns:
| Type | Description |
|---|---|
float
|
Amihud lambda (higher = less liquid). |
amihud_lambda_rolling ¶
Rolling Amihud lambda over a sliding window.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
returns
|
ndarray
|
Return series. |
required |
dollar_volumes
|
ndarray
|
Dollar volume series. |
required |
window
|
int
|
Rolling window size. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Rolling Amihud lambda values. |
angular_distance ¶
Angular distance derived from Pearson correlation.
d = sqrt(0.5 * (1 - rho))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
First variable. |
required |
y
|
ndarray
|
Second variable. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Angular distance in [0, 1]. |
binary_encode ¶
Binary encode a real-valued series (above/below median).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
ndarray
|
Input series. |
required |
Returns:
| Type | Description |
|---|---|
list[bool]
|
True where value >= median, False otherwise. |
brown_durbin_evans ¶
Brown-Durbin-Evans CUSUM test for parameter instability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
residuals
|
ndarray
|
OLS regression residuals. |
required |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, float]
|
(cusum_series, critical_value) — the CUSUM path and 5% significance boundary. |
chu_stinchcombe_white ¶
Chu-Stinchcombe-White CUSUM test for structural breaks in log prices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log_prices
|
ndarray
|
Log price series. |
required |
critical_value
|
float
|
Significance threshold. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
CUSUM statistic series. |
cla_max_sharpe ¶
CLA maximum Sharpe ratio portfolio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expected_returns
|
ndarray
|
Expected return per asset (n,). |
required |
cov
|
ndarray
|
Covariance matrix (n x n). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Max-Sharpe weights (n,). |
cla_min_variance ¶
Critical Line Algorithm (CLA) minimum-variance portfolio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cov
|
ndarray
|
Covariance matrix (n x n). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Minimum-variance weights (n,). |
cluster_kmeans_base ¶
Base-level K-means clustering over a range of k values.
Tries multiple cluster counts and selects the one with the best silhouette score.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corr
|
ndarray
|
Correlation matrix. |
required |
max_clusters
|
int
|
Maximum k to try. |
None
|
min_clusters
|
int
|
Minimum k to try. |
None
|
n_init
|
int
|
Initializations per k. |
None
|
seed
|
int
|
Random seed. |
None
|
Returns:
| Type | Description |
|---|---|
OncResult
|
Labels, silhouette score, and optimal cluster count. |
cluster_kmeans_top ¶
Top-level ONC (Optimal Number of Clusters) algorithm (AFML Ch. 16).
Two-step approach: first clusters, then re-clusters to find the optimal grouping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corr
|
ndarray
|
Correlation matrix. |
required |
max_clusters
|
int
|
Maximum k. |
None
|
min_clusters
|
int
|
Minimum k. |
None
|
n_init
|
int
|
Initializations per k. |
None
|
seed
|
int
|
Random seed. |
None
|
Returns:
| Type | Description |
|---|---|
OncResult
|
Labels, silhouette score, and optimal cluster count. |
compare_allocations ¶
Monte Carlo comparison of HRP, CLA, and IVP allocation methods.
Simulates random correlation matrices and compares out-of-sample Sharpe ratios and variances.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
returns
|
ndarray
|
Return matrix (n_periods, n_assets). |
required |
n_simulations
|
int
|
Number of Monte Carlo trials. |
required |
seed
|
int
|
Random seed. |
required |
Returns:
| Type | Description |
|---|---|
AllocationComparison
|
Sharpe ratios and variances for each method. |
corr_to_cov ¶
Convert a correlation matrix + standard deviations back to a covariance matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corr
|
ndarray
|
Correlation matrix (n x n). |
required |
std
|
ndarray
|
Standard deviations (n,). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Covariance matrix (n x n). |
corwin_schultz_spread ¶
Corwin-Schultz spread estimator from high-low prices.
Estimates the bid-ask spread from consecutive high-low price pairs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
highs
|
ndarray
|
High price series. |
required |
lows
|
ndarray
|
Low price series. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Estimated spread series. |
cov_to_corr ¶
Convert a covariance matrix to a correlation matrix + standard deviations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cov
|
ndarray
|
Covariance matrix (n x n). |
required |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, ndarray]
|
(correlation_matrix, std_devs). |
denoise_corr ¶
Denoise a correlation matrix using Random Matrix Theory (AFML Ch. 2).
Shrinks eigenvalues below the Marcenko-Pastur bound toward their average, removing noise while preserving the signal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corr
|
ndarray
|
Empirical correlation matrix. |
required |
q
|
float
|
Ratio T/N. |
required |
bandwidth
|
float
|
KDE bandwidth for eigenvalue fitting. Auto-selected if None. |
None
|
shrinkage
|
bool
|
Use shrinkage-based denoising instead of constant residual eigenvalue. |
False
|
alpha
|
float
|
Shrinkage intensity (0 to 1). Only used if |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Denoised correlation matrix. |
denoise_cov ¶
Denoise a covariance matrix using RMT.
Converts to correlation, denoises, then converts back.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cov
|
ndarray
|
Empirical covariance matrix. |
required |
q
|
float
|
Ratio T/N. |
required |
bandwidth
|
float
|
KDE bandwidth. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Denoised covariance matrix. |
dependence_matrix ¶
Compute a pairwise dependence matrix using the specified method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
Data matrix (n_observations, n_variables). |
required |
method
|
str
|
One of: |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Symmetric dependence matrix (n_variables x n_variables). |
detone_corr ¶
Remove the market component from a correlation matrix (detoning).
Subtracts the first n_components principal components to
remove common factors (e.g. the market mode).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corr
|
ndarray
|
Correlation matrix. |
required |
n_components
|
int
|
Number of leading eigenvectors to remove (usually 1 for market mode). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Detoned correlation matrix. |
distance_correlation ¶
Distance correlation — a measure of dependence for non-linear relationships.
Unlike Pearson correlation, distance correlation is zero if and only if the variables are independent.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
First variable. |
required |
y
|
ndarray
|
Second variable. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Distance correlation in [0, 1]. |
distance_matrix ¶
Convert a correlation matrix to a distance matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corr
|
ndarray
|
Correlation matrix. |
required |
metric
|
str
|
One of: |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Distance matrix (n x n). |
entropy_implied_vol ¶
Implied volatility from Gaussian entropy.
Inverts the Gaussian entropy formula to recover the standard deviation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entropy
|
float
|
Gaussian entropy value. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Implied volatility (standard deviation). |
fit_kde ¶
Kernel Density Estimation (KDE) for eigenvalue distribution fitting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observations
|
ndarray
|
Observed eigenvalues. |
required |
bandwidth
|
float
|
Gaussian kernel bandwidth. |
required |
eval_points
|
ndarray
|
Points at which to evaluate the KDE. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
KDE density values at the evaluation points. |
gaussian_entropy ¶
Gaussian entropy for a given variance.
H = 0.5 * log2(2 * pi * e * variance)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variance
|
float
|
Variance of the Gaussian distribution. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Differential entropy in bits. |
get_feature_clusters ¶
Cluster features using ONC on their correlation structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
Data matrix (n_samples, n_features). |
required |
max_clusters
|
int
|
Maximum number of feature clusters. |
None
|
seed
|
int
|
Random seed. |
None
|
Returns:
| Type | Description |
|---|---|
OncResult
|
Feature cluster labels and quality metrics. |
gnpr_distance ¶
GNPR (Generalized Non-Parametric Rank) distance.
Combines rank correlation with an information-theoretic component.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
First variable. |
required |
y
|
ndarray
|
Second variable. |
required |
theta
|
float
|
Co-movement threshold. |
required |
n_bins
|
int
|
Number of bins for the information component. |
None
|
Returns:
| Type | Description |
|---|---|
float
|
GNPR distance. |
gpr_distance ¶
GPR (Gerber-Podolskij-Reisenhofer) distance with threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
First variable. |
required |
y
|
ndarray
|
Second variable. |
required |
theta
|
float
|
Co-movement threshold. |
required |
Returns:
| Type | Description |
|---|---|
float
|
GPR distance. |
gsadf ¶
Generalized SADF (GSADF) test series (AFML Ch. 17).
Tests for explosive behavior using flexible start/end windows, providing higher power than SADF for detecting multiple bubbles.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
ndarray
|
Time series. |
required |
min_window
|
int
|
Minimum regression window. |
required |
max_lags
|
int
|
Maximum lags. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
GSADF statistic series. |
gsadf_stat ¶
Generalized SADF scalar statistic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
ndarray
|
Time series. |
required |
min_window
|
int
|
Minimum regression window. |
required |
max_lags
|
int
|
Maximum lags. |
required |
Returns:
| Type | Description |
|---|---|
float
|
GSADF test statistic. |
hasbrouck_lambda ¶
Hasbrouck's lambda via Gibbs sampling (AFML Ch. 19).
Estimates permanent price impact accounting for trade sign uncertainty using a Bayesian approach.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
returns
|
ndarray
|
Return series. |
required |
trade_signs
|
ndarray
|
Signed trade indicators (+1 or -1). |
required |
n_iterations
|
int
|
Number of Gibbs sampling iterations. |
required |
seed
|
int
|
Random seed. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Hasbrouck lambda estimate. |
hrp_weights ¶
Hierarchical Risk Parity (HRP) portfolio weights (AFML Ch. 16).
Uses hierarchical clustering on the correlation matrix to build a diversified portfolio that is more stable than mean-variance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
returns
|
ndarray
|
Return matrix (n_periods, n_assets). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Portfolio weights (n_assets,), sums to 1. |
inverse_variance_weights ¶
Inverse Variance Portfolio (IVP) weights.
Weights each asset inversely proportional to its variance (diagonal of cov).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cov
|
ndarray
|
Covariance matrix (n x n). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
IVP weights (n,), sums to 1. |
kmeans ¶
K-means clustering with multiple initializations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
Data matrix (n_samples, n_features). |
required |
k
|
int
|
Number of clusters. |
required |
max_iter
|
int
|
Maximum iterations per run. |
300
|
n_init
|
int
|
Number of random initializations (best is kept). |
10
|
seed
|
int
|
Random seed. |
42
|
Returns:
| Type | Description |
|---|---|
KMeansResult
|
Cluster labels, centroids, and iteration count. |
kontoyiannis_entropy ¶
Kontoyiannis entropy estimator using longest-match lengths (AFML Ch. 18).
A non-parametric entropy estimator based on how far back one must look to find a match for each substring.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence
|
list[int]
|
Discrete symbol sequence. |
required |
window
|
int
|
Maximum look-back window. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Estimated entropy rate. |
kyle_lambda ¶
Kyle's lambda — price impact from signed order flow (AFML Ch. 19).
Regresses returns on signed volume to estimate the permanent price impact of trades.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
returns
|
ndarray
|
Return series. |
required |
signed_volume
|
ndarray
|
Net signed volume (buy - sell). |
required |
Returns:
| Type | Description |
|---|---|
float
|
Kyle lambda coefficient. |
lempel_ziv_complexity ¶
Lempel-Ziv complexity of a binary string (AFML Ch. 18).
Counts the number of distinct substrings encountered during a sequential parse — a measure of randomness/compressibility.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
binary_string
|
list[bool]
|
Binary sequence. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of distinct patterns (Lempel-Ziv complexity). |
marcenko_pastur_pdf ¶
Marcenko-Pastur probability density function.
Theoretical distribution of eigenvalues for a random correlation matrix with ratio q = T/N.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
var
|
float
|
Variance of the random matrix entries. |
required |
q
|
float
|
Ratio T/N (observations / variables). |
required |
pts
|
int
|
Number of evaluation points. |
required |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, ndarray]
|
(x_values, pdf_values). |
mutual_information ¶
Mutual information between two continuous variables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
First variable. |
required |
y
|
ndarray
|
Second variable. |
required |
n_bins
|
int
|
Number of histogram bins. Auto-selected if None. |
None
|
normalize
|
bool
|
If True, normalize to [0, 1] range. |
False
|
Returns:
| Type | Description |
|---|---|
float
|
Mutual information (non-negative). |
optimal_portfolio ¶
Optimal portfolio weights from a (denoised) covariance matrix.
Computes the minimum-variance or max-Sharpe portfolio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cov
|
ndarray
|
Covariance matrix. |
required |
mu
|
ndarray
|
Expected returns. If None, computes minimum-variance portfolio. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Portfolio weights (sums to 1). |
optimal_transport_dependence ¶
Optimal transport dependence measure.
Based on the Wasserstein distance between joint and product marginal distributions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
First variable. |
required |
y
|
ndarray
|
Second variable. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Optimal transport dependence (non-negative). |
plugin_entropy ¶
Plug-in (maximum likelihood) entropy estimator.
Estimates entropy from a discrete symbol sequence using empirical frequencies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence
|
list[int]
|
Discrete symbol sequence. |
required |
num_symbols
|
int
|
Number of distinct symbols in the alphabet. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Estimated entropy. |
quantile_encode ¶
Quantile-based discretization of a continuous series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
ndarray
|
Input series. |
required |
num_bins
|
int
|
Number of quantile bins. |
required |
Returns:
| Type | Description |
|---|---|
list[int]
|
Bin index (0 to num_bins-1) for each value. |
roll_spread ¶
Roll model bid-ask spread estimator.
Estimates the effective spread from the autocovariance of price changes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prices
|
ndarray
|
Price series. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Estimated bid-ask spread. |
roll_spread_rolling ¶
Rolling Roll model spread estimate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prices
|
ndarray
|
Price series. |
required |
window
|
int
|
Rolling window size. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Rolling spread estimates. |
sadf ¶
Supremum Augmented Dickey-Fuller (SADF) test series (AFML Ch. 17).
Computes a sequence of ADF statistics with expanding windows starting
from min_window.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
ndarray
|
Time series (e.g. log prices). |
required |
min_window
|
int
|
Minimum regression window. |
required |
max_lags
|
int
|
Maximum lags per ADF regression. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
SADF statistic series. |
sadf_stat ¶
Supremum ADF scalar statistic — the maximum of the SADF series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
ndarray
|
Time series. |
required |
min_window
|
int
|
Minimum regression window. |
required |
max_lags
|
int
|
Maximum lags. |
required |
Returns:
| Type | Description |
|---|---|
float
|
SADF test statistic. |
shannon_entropy ¶
Shannon entropy from a probability distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
probs
|
ndarray
|
Probability vector (should sum to 1). |
required |
Returns:
| Type | Description |
|---|---|
float
|
Shannon entropy in bits (log base 2). |
sigma_encode ¶
Sigma-based encoding using standard deviation bands.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
ndarray
|
Input series. |
required |
num_bands
|
int
|
Number of sigma bands on each side of the mean. |
required |
Returns:
| Type | Description |
|---|---|
list[int]
|
Band index for each value. |
silhouette_score ¶
Silhouette score measuring clustering quality.
Ranges from -1 (poor) to +1 (excellent).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
Data matrix (n_samples, n_features). |
required |
labels
|
list[int]
|
Cluster labels for each sample. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Mean silhouette score. |
sm_exp ¶
Sub/super-martingale test with exponential kernel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
ndarray
|
Time series. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Test statistic. |
sm_poly ¶
Sub/super-martingale test with polynomial kernel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
ndarray
|
Time series. |
required |
degree
|
int
|
Polynomial degree for the test. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Test statistic. |
sm_power ¶
Sub/super-martingale test with power kernel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
ndarray
|
Time series. |
required |
power
|
float
|
Power exponent for the kernel. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Test statistic. |
spearmans_rho ¶
Spearman's rank correlation coefficient.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
First variable. |
required |
y
|
ndarray
|
Second variable. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Spearman's rho in [-1, 1]. |
tick_rule_classify ¶
Classify trades using the tick rule.
Assigns +1 (uptick), -1 (downtick), or 0 (no change) to each trade.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prices
|
ndarray
|
Trade price series. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Trade sign series (+1, -1, or 0). |
variation_of_information ¶
Variation of information — a metric-space distance based on entropy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
First variable. |
required |
y
|
ndarray
|
Second variable. |
required |
n_bins
|
int
|
Number of histogram bins. |
None
|
normalize
|
bool
|
If True, normalize to [0, 1] range. |
False
|
Returns:
| Type | Description |
|---|---|
float
|
Variation of information (non-negative). |
vpin ¶
Volume-Synchronized Probability of Informed Trading (VPIN).
Estimates the probability of informed trading from volume-bucketed data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
volumes
|
ndarray
|
Volume series. |
required |
prices
|
ndarray
|
Price series. |
required |
bucket_size
|
float
|
Volume per bucket. |
required |
n_buckets
|
int
|
Number of buckets for the rolling VPIN estimate. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
VPIN estimates at each bucket boundary. |