Simulation Toolbox¶
Synthetic functional data is essential for benchmarking, validating methods, and building intuition. fdars provides two complementary generators:
- Karhunen-Loeve (KL) simulation -- construct curves as random linear combinations of basis eigenfunctions.
- Gaussian process (GP) generation -- sample from a zero-mean GP with a specified covariance kernel.
Both live in the fdars.simulation module and return a 2D NumPy array of shape
(n, m) where n is the number of curves and m is the number of grid points.
Wrapping the result in an Fdata object bundles the data with its evaluation
grid and unlocks convenience methods for depth, distances, derivatives, and more.
Karhunen-Loeve Simulation¶
The Karhunen-Loeve theorem states that any square-integrable random function can be expanded as:
where \(\phi_k\) are orthonormal eigenfunctions and \(\xi_k\) are uncorrelated random scores with \(\operatorname{Var}(\xi_k) = \lambda_k\).
simulate() truncates this expansion to n_basis terms. You control the shape
of the curves through the eigenfunction type and the variance structure
through the eigenvalue decay.
Basic Usage¶
argvals = np.linspace(0, 1, 100)
data = simulate(
n=50, # number of curves
argvals=argvals, # evaluation grid
n_basis=5, # number of KL terms
efun_type="fourier", # eigenfunction family
eval_type="linear", # eigenvalue decay
seed=42, # reproducibility
)
fd = Fdata(data, argvals=argvals)
print(fd) # Fdata (1D) – 50 obs × 100 points – range [0.0, 1.0]
Eigenfunction Types¶
The efun_type parameter controls the shape of the basis functions \(\phi_k\).
"fourier" (default)¶
Sines and cosines of increasing frequency. Produces smooth, oscillatory curves.
data_fourier = simulate(
n=30, argvals=argvals, n_basis=7,
efun_type="fourier", seed=1,
)
fd_fourier = Fdata(data_fourier, argvals=argvals)
"poly"¶
Legendre-like polynomial eigenfunctions. Curves tend to show broad trends without rapid oscillation.
data_poly = simulate(
n=30, argvals=argvals, n_basis=5,
efun_type="poly", seed=1,
)
fd_poly = Fdata(data_poly, argvals=argvals)
"poly_high"¶
Higher-degree polynomial eigenfunctions that introduce more local variation
than "poly".
data_poly_high = simulate(
n=30, argvals=argvals, n_basis=5,
efun_type="poly_high", seed=1,
)
fd_poly_high = Fdata(data_poly_high, argvals=argvals)
"wiener"¶
Eigenfunctions of the Wiener process (Brownian motion). Useful for simulating non-stationary, drifting paths.
data_wiener = simulate(
n=30, argvals=argvals, n_basis=5,
efun_type="wiener", seed=1,
)
fd_wiener = Fdata(data_wiener, argvals=argvals)
Choosing an eigenfunction type
Use "fourier" for periodic or oscillatory data, "poly" for smooth
monotonic trends, and "wiener" for random-walk-like behavior.
Eigenvalue Decay Patterns¶
The eval_type parameter governs how fast the eigenvalues \(\lambda_k\) decay,
which controls the effective dimensionality of the data.
"linear" (default)¶
\(\lambda_k = 1/k\). Slow decay means higher-order components still carry substantial variance, producing more complex curves.
data_linear = simulate(
n=30, argvals=argvals, n_basis=10,
eval_type="linear", seed=2,
)
fd_linear = Fdata(data_linear, argvals=argvals)
"exponential"¶
\(\lambda_k = e^{-k}\). Fast decay concentrates variance in the first few components, yielding smoother, lower-dimensional data.
data_exp = simulate(
n=30, argvals=argvals, n_basis=10,
eval_type="exponential", seed=2,
)
fd_exp = Fdata(data_exp, argvals=argvals)
"wiener"¶
\(\lambda_k = 1/(k - 0.5)^2 \pi^2\). The eigenvalue pattern of a Brownian motion covariance.
data_wiener_eval = simulate(
n=30, argvals=argvals, n_basis=10,
eval_type="wiener", seed=2,
)
fd_wiener_eval = Fdata(data_wiener_eval, argvals=argvals)
Combining Options¶
You can mix any eigenfunction type with any eigenvalue decay:
# Fourier shapes with fast exponential decay -> very smooth oscillatory curves
smooth_osc = simulate(
n=40, argvals=argvals, n_basis=7,
efun_type="fourier", eval_type="exponential", seed=10,
)
fd_smooth_osc = Fdata(smooth_osc, argvals=argvals)
# Polynomial shapes with linear decay -> complex trending curves
complex_trend = simulate(
n=40, argvals=argvals, n_basis=7,
efun_type="poly", eval_type="linear", seed=10,
)
fd_complex_trend = Fdata(complex_trend, argvals=argvals)
Effect of n_basis¶
Increasing n_basis adds higher-frequency variation:
# Low complexity
simple = simulate(n=20, argvals=argvals, n_basis=3, seed=0)
fd_simple = Fdata(simple, argvals=argvals)
# High complexity
complex_ = simulate(n=20, argvals=argvals, n_basis=15, seed=0)
fd_complex = Fdata(complex_, argvals=argvals)
Reproducibility
Pass a fixed seed to get identical results across runs. When seed is
None, a different random sample is produced each time.
Gaussian Process Generation¶
For more control over the local correlation structure, generate samples from a zero-mean Gaussian process with a specified covariance kernel.
Basic Usage¶
argvals = np.linspace(0, 1, 100)
gp_data = gaussian_process(
n=40, # number of curves
argvals=argvals,
kernel="gaussian",
length_scale=0.2,
variance=1.0,
seed=42,
)
fd_gp = Fdata(gp_data, argvals=argvals)
print(fd_gp) # Fdata (1D) – 40 obs × 100 points – range [0.0, 1.0]
Covariance Kernels¶
"gaussian" (squared exponential)¶
Produces infinitely differentiable (very smooth) sample paths.
gp_gauss = gaussian_process(
n=30, argvals=argvals,
kernel="gaussian", length_scale=0.15, variance=1.0, seed=1,
)
fd_gp_gauss = Fdata(gp_gauss, argvals=argvals)
"exponential" (Ornstein-Uhlenbeck)¶
Sample paths are continuous but not differentiable -- rougher than Gaussian.
gp_exp = gaussian_process(
n=30, argvals=argvals,
kernel="exponential", length_scale=0.15, variance=1.0, seed=1,
)
fd_gp_exp = Fdata(gp_exp, argvals=argvals)
"matern"¶
The Matern kernel with smoothness parameter \(\nu = 1.5\):
A middle ground between Gaussian (too smooth) and exponential (too rough).
gp_matern = gaussian_process(
n=30, argvals=argvals,
kernel="matern", length_scale=0.15, variance=1.0, seed=1,
)
fd_gp_matern = Fdata(gp_matern, argvals=argvals)
"periodic"¶
Generates sample paths with a repeating pattern (period \(p = 1.0\) by default).
gp_periodic = gaussian_process(
n=30, argvals=argvals,
kernel="periodic", length_scale=0.3, variance=1.0, seed=1,
)
fd_gp_periodic = Fdata(gp_periodic, argvals=argvals)
Controlling Smoothness with length_scale¶
The length scale \(\ell\) determines how quickly the correlation decays with distance. Smaller values produce more wiggly paths; larger values yield smoother, slowly varying curves.
# Short length scale -> rough
rough = gaussian_process(
n=20, argvals=argvals,
kernel="gaussian", length_scale=0.05, seed=0,
)
fd_rough = Fdata(rough, argvals=argvals)
# Long length scale -> smooth
smooth = gaussian_process(
n=20, argvals=argvals,
kernel="gaussian", length_scale=0.5, seed=0,
)
fd_smooth = Fdata(smooth, argvals=argvals)
Controlling Amplitude with variance¶
The variance parameter \(\sigma^2\) scales the overall amplitude of the curves.
low_var = gaussian_process(
n=20, argvals=argvals,
kernel="gaussian", length_scale=0.2, variance=0.1, seed=0,
)
fd_low_var = Fdata(low_var, argvals=argvals)
high_var = gaussian_process(
n=20, argvals=argvals,
kernel="gaussian", length_scale=0.2, variance=5.0, seed=0,
)
fd_high_var = Fdata(high_var, argvals=argvals)
Computing a Covariance Matrix¶
If you need the raw covariance matrix \(C(s_i, t_j)\) for custom purposes (e.g.,
feeding into your own sampler), use covariance_matrix:
from fdars.simulation import covariance_matrix
argvals = np.linspace(0, 1, 50)
cov = covariance_matrix(
argvals, kernel="gaussian", length_scale=0.2, variance=1.0,
)
print(cov.shape) # (50, 50)
print(f"Diagonal (should be ~1.0): {cov[0, 0]:.4f}")
Full Example: Simulate, Analyze, Cluster¶
Bringing it all together in a realistic workflow:
import numpy as np
import pandas as pd
from fdars import Fdata
from fdars.simulation import simulate
from fdars.clustering import kmeans_fd
# -- Step 1: Generate two groups with different eigenfunction types --
argvals = np.linspace(0, 1, 150)
group_a = simulate(n=40, argvals=argvals, n_basis=5, efun_type="fourier", seed=10)
group_b = simulate(n=40, argvals=argvals, n_basis=5, efun_type="poly", seed=20)
# Wrap each group in Fdata; shift group_b upward so the groups are distinguishable
fd_a = Fdata(group_a, argvals=argvals)
fd_b = Fdata(group_b, argvals=argvals) + 2.0
# Stack into a single dataset with metadata tracking the true group
data = np.vstack([fd_a.data, fd_b.data]) # (80, 150)
true_labels = np.array([0] * 40 + [1] * 40)
fd = Fdata(
data, argvals=argvals,
metadata=pd.DataFrame({"group": true_labels}),
)
# -- Step 2: Summary statistics (Fdata convenience methods) --
mu = fd.mean()
norms = fd.norm()
print(f"Grand mean range: [{mu.min():.2f}, {mu.max():.2f}]")
print(f"Norm range: [{norms.min():.2f}, {norms.max():.2f}]")
# -- Step 3: Depth ranking --
depths = fd.depth("fraiman_muniz")
median_idx = np.argmax(depths)
print(f"Deepest curve: index {median_idx}, depth {depths[median_idx]:.4f}")
# -- Step 4: Clustering --
result = kmeans_fd(fd.data, fd.argvals, k=2, seed=0)
pred_labels = result["cluster"]
# Compare with truth (up to label permutation)
agreement = max(
(pred_labels == true_labels).mean(),
(pred_labels != true_labels).mean(),
)
print(f"Clustering agreement: {agreement:.1%}")
Next Steps¶
- Introduction to fdars -- if you haven't read it yet.
- Smoothing -- apply smoothing to your simulated data.
- Working with Derivatives -- differentiate your curves.
- Covariance Functions -- deeper look at kernel functions.