Statistical Process Monitoring¶
Monitor streams of functional data for out-of-control behavior using FPCA-based control charts. The workflow has two phases:
- Phase I -- Estimate the in-control distribution from historical data and compute control limits.
- Phase II -- Project each incoming observation onto the learned subspace and check whether its \(T^2\) or SPE statistic exceeds the limits.
Concepts¶
FPCA-based monitoring¶
Each curve \(x_i(t)\) is centered by subtracting the mean \(\hat\mu(t)\) and projected onto the first \(K\) functional principal components, yielding a score vector \(\boldsymbol\xi_i \in \mathbb{R}^K\). Two complementary statistics capture different kinds of departure:
| Statistic | What it measures | Formula |
|---|---|---|
| Hotelling \(T^2\) | Systematic shift in the FPC subspace | \(T^2 = \sum_{k=1}^{K} \xi_k^2 / \lambda_k\) |
| SPE (Q) | Residual variation outside the subspace | \(\mathrm{SPE} = \int [\tilde x(t)]^2 \, dt\) where \(\tilde x\) is the reconstruction residual |
Control limits for both are estimated from the Phase I data so that the in-control false-alarm rate is approximately \(\alpha\).
Phase I -- estimating the baseline¶
import numpy as np
from fdars import Fdata
from fdars.simulation import simulate
from fdars.spm import spm_phase1
# Generate 80 in-control curves on a 100-point grid
argvals = np.linspace(0, 1, 100)
fd_ic = Fdata(simulate(80, argvals, n_basis=5, seed=1), argvals=argvals)
# Phase I estimation (3 components, alpha = 0.05)
p1 = spm_phase1(fd_ic.data, fd_ic.argvals, ncomp=3, alpha=0.05)
spm_phase1 returns a dictionary with the following keys:
| Key | Shape | Description |
|---|---|---|
t2 |
(n,) |
\(T^2\) statistic for every Phase I observation |
spe |
(n,) |
SPE statistic for every Phase I observation |
t2_limit |
scalar | Upper control limit for \(T^2\) |
spe_limit |
scalar | Upper control limit for SPE |
mean |
(m,) |
Estimated mean function \(\hat\mu(t)\) |
loadings |
(m, ncomp) |
FPCA rotation matrix (eigenfunctions) |
weights |
(m,) |
Integration weights for the inner product |
eigenvalues |
(ncomp,) |
Eigenvalues \(\lambda_1, \dots, \lambda_K\) |
Choosing ncomp
A common rule of thumb is to retain enough components to explain 90--95 % of the total variance. Too few components make the SPE chart over-sensitive; too many inflate the \(T^2\) chart.
Phase II -- monitoring new observations¶
from fdars.spm import spm_monitor
# Simulate 20 new in-control observations + 10 faulty ones
data_new_ic = simulate(20, argvals, n_basis=5, seed=2)
# Inject a mean shift into the last 10 curves
data_fault = simulate(10, argvals, n_basis=5, seed=3) + 2.0
fd_new = Fdata(np.vstack([data_new_ic, data_fault]), argvals=argvals)
# Monitor
p2 = spm_monitor(
mean=p1["mean"],
loadings=p1["loadings"],
weights=p1["weights"],
eigenvalues=p1["eigenvalues"],
t2_limit=p1["t2_limit"],
spe_limit=p1["spe_limit"],
new_data=fd_new.data,
argvals=fd_new.argvals,
)
The returned dictionary contains:
| Key | Shape | Description |
|---|---|---|
t2 |
(n_new,) |
\(T^2\) for each new observation |
spe |
(n_new,) |
SPE for each new observation |
t2_alarm |
(n_new,) bool |
True where \(T^2\) exceeds the limit |
spe_alarm |
(n_new,) bool |
True where SPE exceeds the limit |
# How many faults were caught?
n_t2_alarms = int(p2["t2_alarm"].sum())
n_spe_alarms = int(p2["spe_alarm"].sum())
print(f"T2 alarms: {n_t2_alarms}, SPE alarms: {n_spe_alarms}")
Hotelling \(T^2\) from scores¶
If you already have FPC scores (e.g., from your own FPCA), you can compute the \(T^2\) statistic directly:
from fdars.spm import hotelling_t2
# scores: (n, p) array of FPC scores
# eigenvalues: (p,) array
t2_values = hotelling_t2(scores, eigenvalues)
Full worked example¶
The script below ties everything together: simulate in-control data, run Phase I, introduce a fault, monitor in Phase II, and visualize the control chart.
import numpy as np
from fdars import Fdata
from fdars.simulation import simulate
from fdars.spm import spm_phase1, spm_monitor
# ── 1. Simulate in-control data ──────────────────────────────
argvals = np.linspace(0, 1, 100)
fd_ic = Fdata(simulate(100, argvals, n_basis=5, seed=10), argvals=argvals)
# ── 2. Phase I ───────────────────────────────────────────────
p1 = spm_phase1(fd_ic.data, fd_ic.argvals, ncomp=3, alpha=0.05)
print(f"T2 limit : {p1['t2_limit']:.3f}")
print(f"SPE limit: {p1['spe_limit']:.3f}")
# ── 3. Simulate Phase II data (in-control + fault) ──────────
data_ok = simulate(30, argvals, n_basis=5, seed=20)
data_bad = simulate(20, argvals, n_basis=5, seed=30) + 3.0 # mean shift
fd_new = Fdata(np.vstack([data_ok, data_bad]), argvals=argvals)
# ── 4. Phase II monitoring ───────────────────────────────────
p2 = spm_monitor(
mean=p1["mean"],
loadings=p1["loadings"],
weights=p1["weights"],
eigenvalues=p1["eigenvalues"],
t2_limit=p1["t2_limit"],
spe_limit=p1["spe_limit"],
new_data=fd_new.data,
argvals=fd_new.argvals,
)
# ── 5. Report ────────────────────────────────────────────────
obs_ids = np.arange(1, len(fd_new) + 1)
alarm_idx = obs_ids[p2["t2_alarm"] | p2["spe_alarm"]]
print(f"Alarm observations: {alarm_idx}")
# ── 6. Visualize (optional, requires matplotlib) ─────────────
try:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 1, figsize=(10, 6), sharex=True)
axes[0].plot(obs_ids, p2["t2"], "o-", markersize=3)
axes[0].axhline(p1["t2_limit"], color="red", linestyle="--", label="UCL")
axes[0].set_ylabel("Hotelling T²")
axes[0].legend()
axes[1].plot(obs_ids, p2["spe"], "o-", markersize=3)
axes[1].axhline(p1["spe_limit"], color="red", linestyle="--", label="UCL")
axes[1].set_ylabel("SPE (Q)")
axes[1].set_xlabel("Observation index")
axes[1].legend()
fig.suptitle("FPCA-based Control Charts")
plt.tight_layout()
plt.savefig("spm_control_chart.png", dpi=150)
plt.show()
except ImportError:
pass
Performance note
Both spm_phase1 and spm_monitor delegate all linear algebra to Rust. Phase I on 500 curves of length 200 typically completes in under 10 ms.