Basis Representation¶
Representing functional data in a finite basis -- B-splines, Fourier, or P-splines -- converts a discrete set of evaluations into a compact coefficient vector. This enables smoothing, differentiation, integration, and dimensionality reduction, all while preserving the continuous nature of the underlying functions.
When to use basis representations¶
- Smoothing noisy data -- P-spline penalties remove high-frequency noise while preserving shape.
- Dimension reduction -- a curve with 500 grid points can be faithfully captured by 15-20 basis coefficients.
- Derivative computation -- analytic derivatives come for free from the basis expansion.
- Regularization -- roughness penalties in the basis domain prevent overfitting in regression.
B-spline vs Fourier basis¶
| Property | B-spline | Fourier |
|---|---|---|
| Support | Local (compact) | Global |
| Best for | Non-periodic data, local features | Periodic / seasonal data |
| Boundary behavior | Handles edges naturally | Assumes periodicity |
| Derivative stability | Excellent | Excellent |
| Basis count rule of thumb | ~1 per interior knot + order | Must be odd (\(2k + 1\)) |
Quick start: project and reconstruct¶
import numpy as np
from fdars import Fdata
from fdars.basis import fdata_to_basis_1d, basis_to_fdata_1d
# Simulate some data
argvals = np.linspace(0, 1, 200)
data = np.column_stack([np.sin(2 * np.pi * argvals) + 0.2 * np.random.randn(200)
for _ in range(30)]).T # shape (30, 200)
fd = Fdata(data, argvals=argvals)
# Project onto a B-spline basis with 15 functions
coeffs, actual_nbasis = fdata_to_basis_1d(fd.data, fd.argvals, n_basis=15,
basis_type="bspline")
print(f"Coefficients shape: {coeffs.shape}") # (30, 15)
print(f"Actual n_basis used: {actual_nbasis}")
# Reconstruct back to the evaluation grid
reconstructed = basis_to_fdata_1d(coeffs, fd.argvals, n_basis=actual_nbasis,
basis_type="bspline")
print(f"Reconstructed shape: {reconstructed.shape}") # (30, 200)
Fourier basis for periodic data¶
# Periodic data: use Fourier basis
argvals_p = np.linspace(0, 2 * np.pi, 200)
periodic_data = np.column_stack([
np.sin(argvals_p) + 0.5 * np.cos(3 * argvals_p) + 0.15 * np.random.randn(200)
for _ in range(30)
]).T
fd_p = Fdata(periodic_data, argvals=argvals_p)
coeffs_f, nbasis_f = fdata_to_basis_1d(fd_p.data, fd_p.argvals, n_basis=11,
basis_type="fourier")
reconstructed_f = basis_to_fdata_1d(coeffs_f, fd_p.argvals, n_basis=nbasis_f,
basis_type="fourier")
Evaluating basis matrices directly¶
For advanced use (e.g., building your own penalty matrices), you can evaluate the raw basis matrix.
B-spline basis¶
from fdars.basis import bspline_basis
argvals = np.linspace(0, 1, 100)
B = bspline_basis(argvals, nknots=10, order=4)
print(B.shape) # (100, 14) -- nknots + order = 14 basis functions
| Parameter | Description |
|---|---|
argvals |
Evaluation points |
nknots |
Number of equally spaced interior knots |
order |
Spline order: 4 = cubic (default), 3 = quadratic |
Fourier basis¶
from fdars.basis import fourier_basis
argvals = np.linspace(0, 2 * np.pi, 100)
F = fourier_basis(argvals, n_basis=11)
print(F.shape) # (100, 11)
The Fourier basis consists of \(1, \sin(\omega t), \cos(\omega t), \sin(2\omega t), \cos(2\omega t), \ldots\) where \(\omega = 2\pi / T\) and \(T\) is the period (range of argvals).
Fourier n_basis
n_basis should be odd. If an even value is given, it will be adjusted to the next odd number so the basis contains matched sine-cosine pairs plus the constant function.
P-spline smoothing¶
P-splines combine a rich B-spline basis with a discrete roughness penalty on the coefficients. The penalty parameter \(\lambda\) controls the trade-off between fit and smoothness.
where \(B\) is the B-spline basis matrix, \(D^d\) is the \(d\)-th order difference matrix, and \(\lambda \ge 0\).
Fixed lambda¶
from fdars.basis import pspline_fit_1d
result = pspline_fit_1d(fd.data, fd.argvals, n_basis=25, lambda_=1e-2, order=2)
print(result.keys())
# dict_keys(['fitted', 'coefficients', 'edf', 'rss', 'gcv', 'aic', 'bic'])
| Key | Description |
|---|---|
fitted |
Smoothed curves, shape (n, m) |
coefficients |
B-spline coefficients, shape (n, n_basis) |
edf |
Effective degrees of freedom |
rss |
Residual sum of squares |
gcv |
Generalized cross-validation score |
aic |
Akaike information criterion |
bic |
Bayesian information criterion |
Automatic lambda via GCV¶
When you do not know the right smoothing level, let GCV choose:
from fdars.basis import pspline_fit_gcv
result = pspline_fit_gcv(fd.data, fd.argvals, n_basis=25, order=2)
print(f"GCV score: {result['gcv']:.6f}")
print(f"Effective degrees of freedom: {result['edf']:.1f}")
Choosing n_basis for P-splines
With P-splines the exact number of basis functions matters less because the penalty controls smoothness. A safe rule is to use a generous basis (e.g., 20-40 functions for 100-500 grid points) and rely on \(\lambda\) to prevent overfitting.
Comparing smoothing levels¶
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 3, figsize=(14, 4), sharey=True)
idx = 0 # curve to visualize
for ax, lam in zip(axes, [1e-6, 1e-2, 1e2]):
res = pspline_fit_1d(fd.data, fd.argvals, n_basis=25, lambda_=lam)
ax.plot(fd.argvals, fd.data[idx], ".", ms=2, alpha=0.4, label="Raw")
ax.plot(fd.argvals, res["fitted"][idx], "r-", lw=2,
label=f"edf={res['edf']:.1f}")
ax.set_title(f"$\\lambda$ = {lam:.0e}")
ax.legend(fontsize=8)
plt.suptitle("P-spline smoothing with different penalty strengths")
plt.tight_layout()
plt.show()
Automatic basis selection¶
select_basis_auto_1d jointly selects:
- Basis type -- B-spline or Fourier (optionally using an FFT-based seasonality hint).
- Number of basis functions -- optimizing GCV, AIC, or BIC.
- P-spline penalty -- when using B-splines.
from fdars.basis import select_basis_auto_1d
selections = select_basis_auto_1d(fd.data, fd.argvals, criterion="gcv")
# Each element corresponds to one curve
for i, sel in enumerate(selections[:3]):
print(f"Curve {i}: basis={sel['basis_type']}, nbasis={sel['nbasis']}, "
f"score={sel['score']:.4f}, seasonal={sel['seasonal_detected']}")
| Parameter | Default | Description |
|---|---|---|
criterion |
"gcv" |
"gcv", "aic", or "bic" |
nbasis_min |
0 (auto) | Lower bound for basis count search |
nbasis_max |
0 (auto) | Upper bound for basis count search |
lambda_pspline |
-1.0 (auto) | P-spline penalty; negative triggers GCV selection |
use_seasonal_hint |
True |
Use FFT to detect periodicity and prefer Fourier |
Each element of the returned list is a dict with:
| Key | Description |
|---|---|
basis_type |
"bspline" or "fourier" |
nbasis |
Optimal number of basis functions |
score |
Information criterion score |
coefficients |
Basis coefficients for this curve |
fitted |
Fitted values for this curve |
edf |
Effective degrees of freedom |
seasonal_detected |
Whether the FFT hint detected periodicity |
lambda_val |
Selected P-spline penalty (if B-spline) |
Cross-validated basis count¶
When you want to fix the basis type and only search over the number of basis functions:
from fdars.basis import basis_nbasis_cv
cv_result = basis_nbasis_cv(
fd.data, fd.argvals,
nbasis_min=4,
nbasis_max=30,
basis_type="bspline",
criterion="gcv",
n_folds=5,
lambda_=1.0,
)
print(f"Optimal n_basis: {cv_result['optimal_nbasis']}")
print(f"Criterion used: {cv_result['criterion']}")
Plotting the CV curve¶
nbasis_range = cv_result["nbasis_range"]
scores = cv_result["scores"]
plt.figure(figsize=(7, 4))
plt.plot(nbasis_range, scores, "o-", color="steelblue")
plt.axvline(cv_result["optimal_nbasis"], ls="--", color="coral",
label=f"Optimal = {cv_result['optimal_nbasis']}")
plt.xlabel("Number of basis functions")
plt.ylabel(f"{cv_result['criterion'].upper()} score")
plt.title("Basis count selection")
plt.legend()
plt.tight_layout()
plt.show()
Information criteria reference¶
| Criterion | Formula | Tends to select |
|---|---|---|
| GCV | \(\displaystyle\frac{n^{-1}\,\text{RSS}}{(1 - \text{edf}/n)^2}\) | Moderate smoothness |
| AIC | \(n\log(\text{RSS}/n) + 2\,\text{edf}\) | Slightly more complex models |
| BIC | \(n\log(\text{RSS}/n) + \log(n)\,\text{edf}\) | Simpler (sparser) models |
GCV vs CV
GCV is a leave-one-out cross-validation approximation that avoids refitting. For small samples, explicit \(k\)-fold CV (set criterion="cv" in basis_nbasis_cv) may be more reliable.
Full workflow: noisy data to smooth representation¶
import numpy as np
import matplotlib.pyplot as plt
from fdars import Fdata
from fdars.simulation import simulate
from fdars.basis import pspline_fit_gcv, basis_nbasis_cv, fdata_to_basis_1d
# 1. Generate noisy data
argvals = np.linspace(0, 1, 300)
clean = simulate(n=50, argvals=argvals, n_basis=5, seed=7)
noisy = clean + 0.3 * np.random.randn(*clean.shape)
fd_noisy = Fdata(noisy, argvals=argvals)
fd_clean = Fdata(clean, argvals=argvals)
# 2. Find optimal basis count
cv = basis_nbasis_cv(fd_noisy.data, fd_noisy.argvals, nbasis_min=5, nbasis_max=35,
basis_type="bspline", criterion="gcv")
print(f"Optimal basis count: {cv['optimal_nbasis']}")
# 3. Smooth with P-splines using optimal basis count
smooth = pspline_fit_gcv(fd_noisy.data, fd_noisy.argvals, n_basis=cv["optimal_nbasis"])
# 4. Visualize
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for ax, idx in zip(axes, [0, 10, 25]):
ax.plot(fd_noisy.argvals, fd_noisy.data[idx], ".", ms=1, alpha=0.3, color="gray", label="Noisy")
ax.plot(fd_clean.argvals, fd_clean.data[idx], "k-", lw=1, alpha=0.5, label="True")
ax.plot(fd_noisy.argvals, smooth["fitted"][idx], "r-", lw=2, label="P-spline")
ax.set_title(f"Curve {idx}")
if idx == 0:
ax.legend(fontsize=8)
plt.suptitle(f"P-spline smoothing (n_basis={cv['optimal_nbasis']}, "
f"edf={smooth['edf']:.1f})")
plt.tight_layout()
plt.show()
API summary¶
| Function | Description |
|---|---|
fdata_to_basis_1d(data, argvals, n_basis, basis_type) |
Project curves onto a basis |
basis_to_fdata_1d(coeffs, argvals, n_basis, basis_type) |
Reconstruct curves from coefficients |
bspline_basis(argvals, nknots, order) |
Evaluate raw B-spline basis matrix |
fourier_basis(argvals, n_basis) |
Evaluate raw Fourier basis matrix |
pspline_fit_1d(data, argvals, n_basis, lambda_, order) |
P-spline fit with fixed \(\lambda\) |
pspline_fit_gcv(data, argvals, n_basis, order) |
P-spline fit with GCV-selected \(\lambda\) |
select_basis_auto_1d(data, argvals, ...) |
Automatic basis type + count selection |
basis_nbasis_cv(data, argvals, ...) |
Cross-validated basis count selection |
smooth_basis_gcv(data, argvals, n_basis, ...) |
Basis smoothing with GCV penalty selection |
All functions are imported from fdars.basis.