Chapter 5: Fractional Differentiation¶
AFML Ch. 5 -- Achieving stationarity while preserving memory.
Standard integer differencing (d=1, i.e., returns) makes a price series stationary
but destroys all long-range memory. The original price series (d=0) retains full
memory but is non-stationary, violating most ML model assumptions. Fractional
differentiation finds the minimum d in (0, 1) that achieves stationarity while
preserving the maximum amount of memory.
Topics covered:
- FFD (Fixed-width window Fractionally Differentiated) weights
- Fractional differentiation at various
dvalues - Expanding window vs FFD comparison
- Finding minimum
dfor stationarity (ADF test) - Correlation preservation analysis
- Polars expression API for fractional differentiation
import numpy as np
import polars as pl
import matplotlib.pyplot as plt
import pymlfinance
%matplotlib inline
plt.rcParams['figure.figsize'] = (14, 6)
plt.rcParams['figure.dpi'] = 150
plt.rcParams['font.size'] = 15
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 15
plt.rcParams['xtick.labelsize'] = 13
plt.rcParams['ytick.labelsize'] = 13
plt.rcParams['legend.fontsize'] = 13
Generate Synthetic Non-Stationary Price Series¶
We create a trending series with upward drift -- a classic non-stationary process. The log of prices is used throughout, as fractional differentiation is applied to log prices.
np.random.seed(42)
n = 500
# Trending series with mean-reverting noise
trend = np.cumsum(np.random.randn(n) * 0.02 + 0.001) # upward drift
prices = 100.0 * np.exp(trend)
log_prices = np.log(prices)
print(f"Generated {n} log prices")
print(f" Start: {log_prices[0]:.4f}, End: {log_prices[-1]:.4f}")
Generated 500 log prices Start: 4.6161, End: 5.1736
# Plot the raw price and log price series
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
ax1.plot(prices, color='steelblue', linewidth=0.8)
ax1.set_xlabel('Observation')
ax1.set_ylabel('Price')
ax1.set_title('Raw Price Series (non-stationary)')
ax1.grid(True, alpha=0.3)
ax2.plot(log_prices, color='#DD8452', linewidth=0.8)
ax2.set_xlabel('Observation')
ax2.set_ylabel('Log Price')
ax2.set_title('Log Price Series (non-stationary)')
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
FFD Weights¶
The FFD (Fixed-width window Fractionally Differentiated) method computes a set of
weights that are applied as a convolution filter to the series. Higher d values
produce faster-decaying weights (more differencing, less memory). A threshold
parameter truncates negligibly small weights.
print(f"--- FFD Weights ---")
for d in [0.3, 0.5, 0.7, 1.0]:
weights = pymlfinance.sampling.get_weights_ffd(d, threshold=1e-4)
print(f" d={d:.1f}: {len(weights)} weights, sum={np.sum(weights):.4f}, "
f"first={weights[0]:.4f}, last={weights[-1]:.6f}")
--- FFD Weights --- d=0.3: 388 weights, sum=0.1289, first=1.0000, last=-0.000100 d=0.5: 200 weights, sum=0.0400, first=1.0000, last=-0.000101 d=0.7: 97 weights, sum=0.0137, first=1.0000, last=-0.000100 d=1.0: 2 weights, sum=0.0000, first=1.0000, last=-1.000000
# Plot FFD weight vectors for different d values
fig, ax = plt.subplots(figsize=(10, 5))
colors = ['#4C72B0', '#DD8452', '#55A868', '#C44E52']
for d, color in zip([0.3, 0.5, 0.7, 1.0], colors):
weights = pymlfinance.sampling.get_weights_ffd(d, threshold=1e-4)
ax.plot(range(len(weights)), weights, 'o-', color=color, markersize=3,
linewidth=1, label=f'd={d:.1f} ({len(weights)} weights)')
ax.set_xlabel('Weight Index (lag)')
ax.set_ylabel('Weight Value')
ax.set_title('FFD Weight Vectors for Different d Values')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Fractional Differentiation at Various d Values¶
As d increases from 0 to 1:
- Correlation with original decreases (memory loss)
- ADF statistic becomes more negative (more stationary)
The goal is to find the smallest d where ADF rejects the unit root null hypothesis
(typically ADF < -2.86 for 5% significance).
print(f"--- FFD at Various d Values ---")
d_values = [0.2, 0.4, 0.6, 0.8, 1.0]
correlations = []
adf_stats = []
for d in d_values:
ffd = pymlfinance.sampling.frac_diff_ffd(log_prices, d=d, threshold=1e-4)
# FFD pads leading entries with NaN — strip them before computing stats
valid = ~np.isnan(ffd)
ffd_valid = ffd[valid]
if len(ffd_valid) > 0:
corr = np.corrcoef(log_prices[valid], ffd_valid)[0, 1]
adf_stat, _ = pymlfinance.features.adf_test(ffd_valid, max_lags=1)
else:
corr = 0.0
adf_stat = 0.0
correlations.append(corr)
adf_stats.append(adf_stat)
print(f" d={d:.1f}: len={len(ffd_valid)}, corr_with_original={corr:.4f}, ADF={adf_stat:.4f}")
# Correlation vs d and ADF statistic vs d
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
ax1.plot(d_values, correlations, 'o-', color='steelblue', linewidth=2, markersize=8)
ax1.set_xlabel('d (fractional order)')
ax1.set_ylabel('Correlation with Original')
ax1.set_title('Memory Preservation: Correlation vs d')
ax1.set_ylim(0, 1.05)
ax1.grid(True, alpha=0.3)
ax2.plot(d_values, adf_stats, 'o-', color='#C44E52', linewidth=2, markersize=8)
ax2.axhline(y=-2.86, color='green', linestyle='--', linewidth=1.5,
label='5% critical value (-2.86)')
ax2.set_xlabel('d (fractional order)')
ax2.set_ylabel('ADF Statistic')
ax2.set_title('Stationarity: ADF Statistic vs d')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Expanding Window vs FFD¶
Two implementations of fractional differentiation:
- FFD (fixed-width window): truncates weights below a threshold, uses a fixed-length filter
- Expanding window: uses all available history, expanding the weight vector at each step
Both should produce similar results, with the expanding window being slightly more accurate but slower.
print(f"--- Expanding Window vs FFD (d=0.5) ---")
ffd_result = pymlfinance.sampling.frac_diff_ffd(log_prices, d=0.5, threshold=1e-4)
exp_result = pymlfinance.sampling.frac_diff_expanding(log_prices, d=0.5, threshold=1e-4)
print(f" FFD length: {len(ffd_result)}")
print(f" Expanding length: {len(exp_result)}")
if len(ffd_result) > 0 and len(exp_result) > 0:
min_len = min(len(ffd_result), len(exp_result))
diff = np.abs(ffd_result[:min_len] - exp_result[:min_len])
print(f" Mean absolute difference: {np.nanmean(diff):.6f}")
Finding Minimum d for Stationarity¶
The find_min_d function searches for the smallest fractional order d that makes
the series stationary (passes the ADF test). This is the "sweet spot" that preserves
maximum memory while achieving stationarity.
min_d = pymlfinance.sampling.find_min_d(log_prices, max_d=1.0, step_size=0.1, threshold=1e-4)
print(f"--- Minimum d for Stationarity ---")
print(f" min_d = {min_d:.2f}")
# Verify
ffd_min = pymlfinance.sampling.frac_diff_ffd(log_prices, d=min_d, threshold=1e-4)
if len(ffd_min) > 0:
valid = ~np.isnan(ffd_min)
ffd_valid = ffd_min[valid]
adf_stat, _ = pymlfinance.features.adf_test(ffd_valid, max_lags=1)
corr = np.corrcoef(log_prices[valid], ffd_valid)[0, 1]
print(f" ADF statistic at d={min_d:.2f}: {adf_stat:.4f}")
print(f" Correlation with original: {corr:.4f}")
print(f" (Compare: integer differencing d=1.0 destroys all memory)")
# Plot original vs fractionally differentiated series
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)
# Original log prices
axes[0].plot(log_prices, color='steelblue', linewidth=0.8)
axes[0].set_ylabel('Log Price')
axes[0].set_title(f'Original Log Prices (d=0, non-stationary)')
axes[0].grid(True, alpha=0.3)
# Fractionally differentiated at min_d
ffd_opt = pymlfinance.sampling.frac_diff_ffd(log_prices, d=min_d, threshold=1e-4)
valid_opt = ~np.isnan(ffd_opt)
corr_opt = np.corrcoef(log_prices[valid_opt], ffd_opt[valid_opt])[0, 1]
axes[1].plot(ffd_opt, color='#55A868', linewidth=0.8)
axes[1].set_ylabel(f'FFD (d={min_d:.1f})')
axes[1].set_title(f'Fractionally Differentiated (d={min_d:.1f}, corr={corr_opt:.4f})')
axes[1].grid(True, alpha=0.3)
# Fully differentiated (returns)
ffd_full = pymlfinance.sampling.frac_diff_ffd(log_prices, d=1.0, threshold=1e-4)
valid_full = ~np.isnan(ffd_full)
corr_full = np.corrcoef(log_prices[valid_full], ffd_full[valid_full])[0, 1]
axes[2].plot(ffd_full, color='#C44E52', linewidth=0.8)
axes[2].set_ylabel('Returns (d=1.0)')
axes[2].set_title(f'Integer Differentiated (d=1.0, corr={corr_full:.4f})')
axes[2].set_xlabel('Observation')
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Weight Vectors¶
The raw weight vectors (without FFD truncation) show how fractional differentiation assigns exponentially decaying weights to past observations.
print(f"--- Weight Vectors ---")
for d in [0.3, 0.5, 0.7]:
w = pymlfinance.sampling.get_weights(d, size=10)
print(f" d={d:.1f}: weights = [{', '.join(f'{x:.4f}' for x in w[:5])}...]")
--- Weight Vectors --- d=0.3: weights = [1.0000, -0.3000, -0.1050, -0.0595, -0.0402...] d=0.5: weights = [1.0000, -0.5000, -0.1250, -0.0625, -0.0391...] d=0.7: weights = [1.0000, -0.7000, -0.1050, -0.0455, -0.0262...]
Polars Expression API¶
The .ml namespace on Polars expressions provides fractional differentiation,
find_min_d, and ADF test functions in a DataFrame-native API.
import pymlfinance.polars
df = pl.DataFrame({"log_price": log_prices})
result = df.with_columns(
pl.col("log_price").ml.frac_diff_ffd(d=0.5, threshold=1e-4).alias("ffd_0.5"),
pl.col("log_price").ml.frac_diff_expanding(d=0.5, threshold=1e-4).alias("exp_0.5"),
)
print(f" DataFrame shape: {result.shape}")
print(result.head(5))
DataFrame shape: (500, 3) shape: (5, 3) ┌───────────┬─────────┬──────────┐ │ log_price ┆ ffd_0.5 ┆ exp_0.5 │ │ --- ┆ --- ┆ --- │ │ f64 ┆ f64 ┆ f64 │ ╞═══════════╪═════════╪══════════╡ │ 4.616104 ┆ NaN ┆ 4.616104 │ │ 4.614339 ┆ NaN ┆ 2.306287 │ │ 4.628293 ┆ NaN ┆ 1.74411 │ │ 4.659754 ┆ NaN ┆ 1.480308 │ │ 4.65607 ┆ NaN ┆ 1.278944 │ └───────────┴─────────┴──────────┘
# Find min d via Polars
min_d_pl = df.select(
pl.col("log_price").ml.find_min_d(max_d=1.0, step_size=0.1, threshold=1e-4)
).item()
print(f" Polars find_min_d: {min_d_pl:.2f}")
# ADF test via Polars
adf_pl = df.select(
pl.col("log_price").ml.adf_test(max_lags=1)
).item()
print(f" Polars ADF on raw log prices: {adf_pl:.4f}")
Polars find_min_d: 0.40 Polars ADF on raw log prices: -0.4993
Exercises¶
Correlation vs d trade-off: Plot correlation vs d and ADF statistic vs d on the same figure to visually identify the sweet spot where stationarity is achieved with maximum memory.
Threshold sensitivity: Try different thresholds (1e-3, 1e-4, 1e-5) and compare FFD output length. Smaller thresholds use more weights (longer memory) at the cost of more computation.
Stationary input: Generate a stationary series (e.g., returns) and verify that
find_min_dreturns approximately 0, confirming no additional differencing is needed.