Chapter 5: Fractional Differentiation¶

AFML Ch. 5 -- Achieving stationarity while preserving memory.

Standard integer differencing (d=1, i.e., returns) makes a price series stationary but destroys all long-range memory. The original price series (d=0) retains full memory but is non-stationary, violating most ML model assumptions. Fractional differentiation finds the minimum d in (0, 1) that achieves stationarity while preserving the maximum amount of memory.

Topics covered:

FFD (Fixed-width window Fractionally Differentiated) weights
Fractional differentiation at various d values
Expanding window vs FFD comparison
Finding minimum d for stationarity (ADF test)
Correlation preservation analysis
Polars expression API for fractional differentiation

In [1]:

Copied!





import numpy as np
import polars as pl
import matplotlib.pyplot as plt
import pymlfinance

%matplotlib inline
plt.rcParams['figure.figsize'] = (14, 6)
plt.rcParams['figure.dpi'] = 150
plt.rcParams['font.size'] = 15
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 15
plt.rcParams['xtick.labelsize'] = 13
plt.rcParams['ytick.labelsize'] = 13
plt.rcParams['legend.fontsize'] = 13
import numpy as np
import polars as pl
import matplotlib.pyplot as plt
import pymlfinance

%matplotlib inline
plt.rcParams['figure.figsize'] = (14, 6)
plt.rcParams['figure.dpi'] = 150
plt.rcParams['font.size'] = 15
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['axes.labelsize'] = 15
plt.rcParams['xtick.labelsize'] = 13
plt.rcParams['ytick.labelsize'] = 13
plt.rcParams['legend.fontsize'] = 13

Generate Synthetic Non-Stationary Price Series¶

We create a trending series with upward drift -- a classic non-stationary process. The log of prices is used throughout, as fractional differentiation is applied to log prices.

In [2]:

Copied!





np.random.seed(42)

n = 500
# Trending series with mean-reverting noise
trend = np.cumsum(np.random.randn(n) * 0.02 + 0.001)  # upward drift
prices = 100.0 * np.exp(trend)
log_prices = np.log(prices)

print(f"Generated {n} log prices")
print(f"  Start: {log_prices[0]:.4f}, End: {log_prices[-1]:.4f}")
np.random.seed(42)

n = 500
# Trending series with mean-reverting noise
trend = np.cumsum(np.random.randn(n) * 0.02 + 0.001)  # upward drift
prices = 100.0 * np.exp(trend)
log_prices = np.log(prices)

print(f"Generated {n} log prices")
print(f"  Start: {log_prices[0]:.4f}, End: {log_prices[-1]:.4f}")

Generated 500 log prices
  Start: 4.6161, End: 5.1736

In [3]:

Copied!





# Plot the raw price and log price series
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(prices, color='steelblue', linewidth=0.8)
ax1.set_xlabel('Observation')
ax1.set_ylabel('Price')
ax1.set_title('Raw Price Series (non-stationary)')
ax1.grid(True, alpha=0.3)

ax2.plot(log_prices, color='#DD8452', linewidth=0.8)
ax2.set_xlabel('Observation')
ax2.set_ylabel('Log Price')
ax2.set_title('Log Price Series (non-stationary)')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
# Plot the raw price and log price series
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(prices, color='steelblue', linewidth=0.8)
ax1.set_xlabel('Observation')
ax1.set_ylabel('Price')
ax1.set_title('Raw Price Series (non-stationary)')
ax1.grid(True, alpha=0.3)

ax2.plot(log_prices, color='#DD8452', linewidth=0.8)
ax2.set_xlabel('Observation')
ax2.set_ylabel('Log Price')
ax2.set_title('Log Price Series (non-stationary)')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

No description has been provided for this image

FFD Weights¶

The FFD (Fixed-width window Fractionally Differentiated) method computes a set of weights that are applied as a convolution filter to the series. Higher d values produce faster-decaying weights (more differencing, less memory). A threshold parameter truncates negligibly small weights.

In [4]:

Copied!





print(f"--- FFD Weights ---")
for d in [0.3, 0.5, 0.7, 1.0]:
    weights = pymlfinance.sampling.get_weights_ffd(d, threshold=1e-4)
    print(f"  d={d:.1f}: {len(weights)} weights, sum={np.sum(weights):.4f}, "
          f"first={weights[0]:.4f}, last={weights[-1]:.6f}")
print(f"--- FFD Weights ---")
for d in [0.3, 0.5, 0.7, 1.0]:
    weights = pymlfinance.sampling.get_weights_ffd(d, threshold=1e-4)
    print(f"  d={d:.1f}: {len(weights)} weights, sum={np.sum(weights):.4f}, "
          f"first={weights[0]:.4f}, last={weights[-1]:.6f}")

--- FFD Weights ---
  d=0.3: 388 weights, sum=0.1289, first=1.0000, last=-0.000100
  d=0.5: 200 weights, sum=0.0400, first=1.0000, last=-0.000101
  d=0.7: 97 weights, sum=0.0137, first=1.0000, last=-0.000100
  d=1.0: 2 weights, sum=0.0000, first=1.0000, last=-1.000000

In [5]:

Copied!





# Plot FFD weight vectors for different d values
fig, ax = plt.subplots(figsize=(10, 5))
colors = ['#4C72B0', '#DD8452', '#55A868', '#C44E52']
for d, color in zip([0.3, 0.5, 0.7, 1.0], colors):
    weights = pymlfinance.sampling.get_weights_ffd(d, threshold=1e-4)
    ax.plot(range(len(weights)), weights, 'o-', color=color, markersize=3,
            linewidth=1, label=f'd={d:.1f} ({len(weights)} weights)')
ax.set_xlabel('Weight Index (lag)')
ax.set_ylabel('Weight Value')
ax.set_title('FFD Weight Vectors for Different d Values')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Plot FFD weight vectors for different d values
fig, ax = plt.subplots(figsize=(10, 5))
colors = ['#4C72B0', '#DD8452', '#55A868', '#C44E52']
for d, color in zip([0.3, 0.5, 0.7, 1.0], colors):
    weights = pymlfinance.sampling.get_weights_ffd(d, threshold=1e-4)
    ax.plot(range(len(weights)), weights, 'o-', color=color, markersize=3,
            linewidth=1, label=f'd={d:.1f} ({len(weights)} weights)')
ax.set_xlabel('Weight Index (lag)')
ax.set_ylabel('Weight Value')
ax.set_title('FFD Weight Vectors for Different d Values')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Fractional Differentiation at Various d Values¶

As d increases from 0 to 1:

Correlation with original decreases (memory loss)
ADF statistic becomes more negative (more stationary)

The goal is to find the smallest d where ADF rejects the unit root null hypothesis (typically ADF < -2.86 for 5% significance).

In [ ]:

Copied!





print(f"--- FFD at Various d Values ---")
d_values = [0.2, 0.4, 0.6, 0.8, 1.0]
correlations = []
adf_stats = []
for d in d_values:
    ffd = pymlfinance.sampling.frac_diff_ffd(log_prices, d=d, threshold=1e-4)
    # FFD pads leading entries with NaN — strip them before computing stats
    valid = ~np.isnan(ffd)
    ffd_valid = ffd[valid]
    if len(ffd_valid) > 0:
        corr = np.corrcoef(log_prices[valid], ffd_valid)[0, 1]
        adf_stat, _ = pymlfinance.features.adf_test(ffd_valid, max_lags=1)
    else:
        corr = 0.0
        adf_stat = 0.0
    correlations.append(corr)
    adf_stats.append(adf_stat)
    print(f"  d={d:.1f}: len={len(ffd_valid)}, corr_with_original={corr:.4f}, ADF={adf_stat:.4f}")
print(f"--- FFD at Various d Values ---")
d_values = [0.2, 0.4, 0.6, 0.8, 1.0]
correlations = []
adf_stats = []
for d in d_values:
    ffd = pymlfinance.sampling.frac_diff_ffd(log_prices, d=d, threshold=1e-4)
    # FFD pads leading entries with NaN — strip them before computing stats
    valid = ~np.isnan(ffd)
    ffd_valid = ffd[valid]
    if len(ffd_valid) > 0:
        corr = np.corrcoef(log_prices[valid], ffd_valid)[0, 1]
        adf_stat, _ = pymlfinance.features.adf_test(ffd_valid, max_lags=1)
    else:
        corr = 0.0
        adf_stat = 0.0
    correlations.append(corr)
    adf_stats.append(adf_stat)
    print(f"  d={d:.1f}: len={len(ffd_valid)}, corr_with_original={corr:.4f}, ADF={adf_stat:.4f}")

In [7]:

Copied!





# Correlation vs d and ADF statistic vs d
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(d_values, correlations, 'o-', color='steelblue', linewidth=2, markersize=8)
ax1.set_xlabel('d (fractional order)')
ax1.set_ylabel('Correlation with Original')
ax1.set_title('Memory Preservation: Correlation vs d')
ax1.set_ylim(0, 1.05)
ax1.grid(True, alpha=0.3)

ax2.plot(d_values, adf_stats, 'o-', color='#C44E52', linewidth=2, markersize=8)
ax2.axhline(y=-2.86, color='green', linestyle='--', linewidth=1.5,
            label='5% critical value (-2.86)')
ax2.set_xlabel('d (fractional order)')
ax2.set_ylabel('ADF Statistic')
ax2.set_title('Stationarity: ADF Statistic vs d')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
# Correlation vs d and ADF statistic vs d
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(d_values, correlations, 'o-', color='steelblue', linewidth=2, markersize=8)
ax1.set_xlabel('d (fractional order)')
ax1.set_ylabel('Correlation with Original')
ax1.set_title('Memory Preservation: Correlation vs d')
ax1.set_ylim(0, 1.05)
ax1.grid(True, alpha=0.3)

ax2.plot(d_values, adf_stats, 'o-', color='#C44E52', linewidth=2, markersize=8)
ax2.axhline(y=-2.86, color='green', linestyle='--', linewidth=1.5,
            label='5% critical value (-2.86)')
ax2.set_xlabel('d (fractional order)')
ax2.set_ylabel('ADF Statistic')
ax2.set_title('Stationarity: ADF Statistic vs d')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Expanding Window vs FFD¶

Two implementations of fractional differentiation:

FFD (fixed-width window): truncates weights below a threshold, uses a fixed-length filter
Expanding window: uses all available history, expanding the weight vector at each step

Both should produce similar results, with the expanding window being slightly more accurate but slower.

In [ ]:

Copied!





print(f"--- Expanding Window vs FFD (d=0.5) ---")
ffd_result = pymlfinance.sampling.frac_diff_ffd(log_prices, d=0.5, threshold=1e-4)
exp_result = pymlfinance.sampling.frac_diff_expanding(log_prices, d=0.5, threshold=1e-4)
print(f"  FFD length:       {len(ffd_result)}")
print(f"  Expanding length: {len(exp_result)}")
if len(ffd_result) > 0 and len(exp_result) > 0:
    min_len = min(len(ffd_result), len(exp_result))
    diff = np.abs(ffd_result[:min_len] - exp_result[:min_len])
    print(f"  Mean absolute difference: {np.nanmean(diff):.6f}")
print(f"--- Expanding Window vs FFD (d=0.5) ---")
ffd_result = pymlfinance.sampling.frac_diff_ffd(log_prices, d=0.5, threshold=1e-4)
exp_result = pymlfinance.sampling.frac_diff_expanding(log_prices, d=0.5, threshold=1e-4)
print(f"  FFD length:       {len(ffd_result)}")
print(f"  Expanding length: {len(exp_result)}")
if len(ffd_result) > 0 and len(exp_result) > 0:
    min_len = min(len(ffd_result), len(exp_result))
    diff = np.abs(ffd_result[:min_len] - exp_result[:min_len])
    print(f"  Mean absolute difference: {np.nanmean(diff):.6f}")

Finding Minimum d for Stationarity¶

The find_min_d function searches for the smallest fractional order d that makes the series stationary (passes the ADF test). This is the "sweet spot" that preserves maximum memory while achieving stationarity.

In [ ]:

Copied!





min_d = pymlfinance.sampling.find_min_d(log_prices, max_d=1.0, step_size=0.1, threshold=1e-4)
print(f"--- Minimum d for Stationarity ---")
print(f"  min_d = {min_d:.2f}")

# Verify
ffd_min = pymlfinance.sampling.frac_diff_ffd(log_prices, d=min_d, threshold=1e-4)
if len(ffd_min) > 0:
    valid = ~np.isnan(ffd_min)
    ffd_valid = ffd_min[valid]
    adf_stat, _ = pymlfinance.features.adf_test(ffd_valid, max_lags=1)
    corr = np.corrcoef(log_prices[valid], ffd_valid)[0, 1]
    print(f"  ADF statistic at d={min_d:.2f}: {adf_stat:.4f}")
    print(f"  Correlation with original: {corr:.4f}")
    print(f"  (Compare: integer differencing d=1.0 destroys all memory)")
min_d = pymlfinance.sampling.find_min_d(log_prices, max_d=1.0, step_size=0.1, threshold=1e-4)
print(f"--- Minimum d for Stationarity ---")
print(f"  min_d = {min_d:.2f}")

# Verify
ffd_min = pymlfinance.sampling.frac_diff_ffd(log_prices, d=min_d, threshold=1e-4)
if len(ffd_min) > 0:
    valid = ~np.isnan(ffd_min)
    ffd_valid = ffd_min[valid]
    adf_stat, _ = pymlfinance.features.adf_test(ffd_valid, max_lags=1)
    corr = np.corrcoef(log_prices[valid], ffd_valid)[0, 1]
    print(f"  ADF statistic at d={min_d:.2f}: {adf_stat:.4f}")
    print(f"  Correlation with original: {corr:.4f}")
    print(f"  (Compare: integer differencing d=1.0 destroys all memory)")

In [ ]:

Copied!





# Plot original vs fractionally differentiated series
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)

# Original log prices
axes[0].plot(log_prices, color='steelblue', linewidth=0.8)
axes[0].set_ylabel('Log Price')
axes[0].set_title(f'Original Log Prices (d=0, non-stationary)')
axes[0].grid(True, alpha=0.3)

# Fractionally differentiated at min_d
ffd_opt = pymlfinance.sampling.frac_diff_ffd(log_prices, d=min_d, threshold=1e-4)
valid_opt = ~np.isnan(ffd_opt)
corr_opt = np.corrcoef(log_prices[valid_opt], ffd_opt[valid_opt])[0, 1]
axes[1].plot(ffd_opt, color='#55A868', linewidth=0.8)
axes[1].set_ylabel(f'FFD (d={min_d:.1f})')
axes[1].set_title(f'Fractionally Differentiated (d={min_d:.1f}, corr={corr_opt:.4f})')
axes[1].grid(True, alpha=0.3)

# Fully differentiated (returns)
ffd_full = pymlfinance.sampling.frac_diff_ffd(log_prices, d=1.0, threshold=1e-4)
valid_full = ~np.isnan(ffd_full)
corr_full = np.corrcoef(log_prices[valid_full], ffd_full[valid_full])[0, 1]
axes[2].plot(ffd_full, color='#C44E52', linewidth=0.8)
axes[2].set_ylabel('Returns (d=1.0)')
axes[2].set_title(f'Integer Differentiated (d=1.0, corr={corr_full:.4f})')
axes[2].set_xlabel('Observation')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
# Plot original vs fractionally differentiated series
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)

# Original log prices
axes[0].plot(log_prices, color='steelblue', linewidth=0.8)
axes[0].set_ylabel('Log Price')
axes[0].set_title(f'Original Log Prices (d=0, non-stationary)')
axes[0].grid(True, alpha=0.3)

# Fractionally differentiated at min_d
ffd_opt = pymlfinance.sampling.frac_diff_ffd(log_prices, d=min_d, threshold=1e-4)
valid_opt = ~np.isnan(ffd_opt)
corr_opt = np.corrcoef(log_prices[valid_opt], ffd_opt[valid_opt])[0, 1]
axes[1].plot(ffd_opt, color='#55A868', linewidth=0.8)
axes[1].set_ylabel(f'FFD (d={min_d:.1f})')
axes[1].set_title(f'Fractionally Differentiated (d={min_d:.1f}, corr={corr_opt:.4f})')
axes[1].grid(True, alpha=0.3)

# Fully differentiated (returns)
ffd_full = pymlfinance.sampling.frac_diff_ffd(log_prices, d=1.0, threshold=1e-4)
valid_full = ~np.isnan(ffd_full)
corr_full = np.corrcoef(log_prices[valid_full], ffd_full[valid_full])[0, 1]
axes[2].plot(ffd_full, color='#C44E52', linewidth=0.8)
axes[2].set_ylabel('Returns (d=1.0)')
axes[2].set_title(f'Integer Differentiated (d=1.0, corr={corr_full:.4f})')
axes[2].set_xlabel('Observation')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Weight Vectors¶

The raw weight vectors (without FFD truncation) show how fractional differentiation assigns exponentially decaying weights to past observations.

In [11]:

Copied!





print(f"--- Weight Vectors ---")
for d in [0.3, 0.5, 0.7]:
    w = pymlfinance.sampling.get_weights(d, size=10)
    print(f"  d={d:.1f}: weights = [{', '.join(f'{x:.4f}' for x in w[:5])}...]")
print(f"--- Weight Vectors ---")
for d in [0.3, 0.5, 0.7]:
    w = pymlfinance.sampling.get_weights(d, size=10)
    print(f"  d={d:.1f}: weights = [{', '.join(f'{x:.4f}' for x in w[:5])}...]")

--- Weight Vectors ---
  d=0.3: weights = [1.0000, -0.3000, -0.1050, -0.0595, -0.0402...]
  d=0.5: weights = [1.0000, -0.5000, -0.1250, -0.0625, -0.0391...]
  d=0.7: weights = [1.0000, -0.7000, -0.1050, -0.0455, -0.0262...]

Polars Expression API¶

The .ml namespace on Polars expressions provides fractional differentiation, find_min_d, and ADF test functions in a DataFrame-native API.

In [12]:

Copied!





import pymlfinance.polars

df = pl.DataFrame({"log_price": log_prices})
result = df.with_columns(
    pl.col("log_price").ml.frac_diff_ffd(d=0.5, threshold=1e-4).alias("ffd_0.5"),
    pl.col("log_price").ml.frac_diff_expanding(d=0.5, threshold=1e-4).alias("exp_0.5"),
)
print(f"  DataFrame shape: {result.shape}")
print(result.head(5))
import pymlfinance.polars

df = pl.DataFrame({"log_price": log_prices})
result = df.with_columns(
    pl.col("log_price").ml.frac_diff_ffd(d=0.5, threshold=1e-4).alias("ffd_0.5"),
    pl.col("log_price").ml.frac_diff_expanding(d=0.5, threshold=1e-4).alias("exp_0.5"),
)
print(f"  DataFrame shape: {result.shape}")
print(result.head(5))

  DataFrame shape: (500, 3)
shape: (5, 3)
┌───────────┬─────────┬──────────┐
│ log_price ┆ ffd_0.5 ┆ exp_0.5  │
│ ---       ┆ ---     ┆ ---      │
│ f64       ┆ f64     ┆ f64      │
╞═══════════╪═════════╪══════════╡
│ 4.616104  ┆ NaN     ┆ 4.616104 │
│ 4.614339  ┆ NaN     ┆ 2.306287 │
│ 4.628293  ┆ NaN     ┆ 1.74411  │
│ 4.659754  ┆ NaN     ┆ 1.480308 │
│ 4.65607   ┆ NaN     ┆ 1.278944 │
└───────────┴─────────┴──────────┘

In [13]:

Copied!





# Find min d via Polars
min_d_pl = df.select(
    pl.col("log_price").ml.find_min_d(max_d=1.0, step_size=0.1, threshold=1e-4)
).item()
print(f"  Polars find_min_d: {min_d_pl:.2f}")

# ADF test via Polars
adf_pl = df.select(
    pl.col("log_price").ml.adf_test(max_lags=1)
).item()
print(f"  Polars ADF on raw log prices: {adf_pl:.4f}")
# Find min d via Polars
min_d_pl = df.select(
    pl.col("log_price").ml.find_min_d(max_d=1.0, step_size=0.1, threshold=1e-4)
).item()
print(f"  Polars find_min_d: {min_d_pl:.2f}")

# ADF test via Polars
adf_pl = df.select(
    pl.col("log_price").ml.adf_test(max_lags=1)
).item()
print(f"  Polars ADF on raw log prices: {adf_pl:.4f}")

  Polars find_min_d: 0.40
  Polars ADF on raw log prices: -0.4993

Exercises¶

Correlation vs d trade-off: Plot correlation vs d and ADF statistic vs d on the same figure to visually identify the sweet spot where stationarity is achieved with maximum memory.
Threshold sensitivity: Try different thresholds (1e-3, 1e-4, 1e-5) and compare FFD output length. Smaller thresholds use more weights (longer memory) at the cost of more computation.
Stationary input: Generate a stationary series (e.g., returns) and verify that find_min_d returns approximately 0, confirming no additional differencing is needed.