Outlier Detection¶
Functional outliers come in three flavours:
| Type | Description | Example |
|---|---|---|
| Magnitude | The curve lies far above or below the bulk of the data | A temperature sensor reading 20 degrees higher than all others |
| Shape | The curve has an unusual pattern even if its overall level is normal | A growth curve that dips where all others rise |
| Amplitude | The curve has exaggerated peaks and troughs | A vibration signal with double the usual amplitude |
fdars provides three complementary methods that target different outlier types.
LRT-based detection¶
A likelihood-ratio test approach that compares the likelihood of the data with and without each candidate outlier. A bootstrap procedure determines the rejection threshold.
import numpy as np
from fdars import Fdata
from fdars.simulation import simulate
from fdars.outliers import detect_outliers_lrt
argvals = np.linspace(0, 1, 100)
fd = Fdata(simulate(50, argvals, n_basis=5, seed=1), argvals=argvals)
# Inject two magnitude outliers
fd.data[0] += 8.0
fd.data[1] -= 8.0
result = detect_outliers_lrt(fd.data, alpha=0.05, n_bootstrap=200, trim=0.1, smo=0.02)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data |
ndarray (n, m) |
-- | Functional observations |
alpha |
float |
0.05 |
Significance level |
n_bootstrap |
int |
200 |
Number of bootstrap replicates for threshold estimation |
trim |
float |
0.1 |
Trimming proportion for the robust mean |
smo |
float |
0.02 |
Smoothing parameter for the likelihood ratio |
Returns a dictionary:
| Key | Type | Description |
|---|---|---|
outliers |
ndarray (n,) bool |
True for each detected outlier |
threshold |
float |
Computed rejection threshold |
outlier_ids = np.where(result["outliers"])[0]
print(f"Outlier indices: {outlier_ids}")
print(f"Threshold: {result['threshold']:.4f}")
Outliergram (MEI vs MBD)¶
The outliergram plots the Modified Epigraph Index (MEI) against the Modified Band Depth (MBD) for every curve. Points that fall far from the parabolic relationship \(\mathrm{MBD} = a_0 + a_1 \cdot \mathrm{MEI} + a_2 \cdot \mathrm{MEI}^2\) are flagged as shape outliers.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data |
ndarray (n, m) |
-- | Functional observations |
factor |
float |
1.5 |
Outlier factor (analogous to the IQR multiplier in a boxplot) |
Returns a dictionary:
| Key | Shape | Description |
|---|---|---|
mei |
(n,) |
Modified Epigraph Index |
mbd |
(n,) |
Modified Band Depth |
outliers |
(n,) bool |
Outlier flags |
Choosing the factor
A factor of 1.5 (the default) mirrors the classic boxplot rule. Increase it to 2.0 or 3.0 if you want to be more conservative and only flag extreme shape departures.
Magnitude-shape outlyingness¶
This method decomposes each observation's outlyingness into a magnitude component and a shape component using the directional outlyingness framework. It is particularly effective at detecting curves that are unusual in shape even when their overall level is normal.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data |
ndarray (n, m) |
-- | Functional observations |
Returns a dictionary:
| Key | Shape | Description |
|---|---|---|
magnitude |
(n,) |
Magnitude outlyingness score for each curve |
shape |
(n,) |
Shape outlyingness score for each curve |
You can flag outliers by thresholding either component (e.g., values above the 97.5th percentile):
mag_threshold = np.percentile(result_ms["magnitude"], 97.5)
shape_threshold = np.percentile(result_ms["shape"], 97.5)
mag_outliers = result_ms["magnitude"] > mag_threshold
shape_outliers = result_ms["shape"] > shape_threshold
print(f"Magnitude outliers: {np.where(mag_outliers)[0]}")
print(f"Shape outliers: {np.where(shape_outliers)[0]}")
Full example -- detect and visualize outliers¶
import numpy as np
from fdars import Fdata
from fdars.simulation import simulate
from fdars.outliers import detect_outliers_lrt, outliergram, magnitude_shape
# ── 1. Generate clean data + outliers ─────────────────────────
argvals = np.linspace(0, 1, 100)
fd = Fdata(simulate(50, argvals, n_basis=5, seed=42), argvals=argvals)
# Magnitude outlier
fd.data[0] += 7.0
# Shape outlier (reversed curve)
fd.data[1] = -fd.data[1]
# Amplitude outlier (exaggerated)
fd.data[2] *= 3.0
# ── 2. LRT detection ─────────────────────────────────────────
lrt = detect_outliers_lrt(fd.data, alpha=0.05, n_bootstrap=200)
print("LRT outliers:", np.where(lrt["outliers"])[0])
# ── 3. Outliergram ───────────────────────────────────────────
og = outliergram(fd.data, factor=1.5)
print("Outliergram outliers:", np.where(og["outliers"])[0])
# ── 4. Magnitude-shape ──────────────────────────────────────
ms = magnitude_shape(fd.data)
print(f"Top magnitude scores: indices {np.argsort(ms['magnitude'])[-3:][::-1]}")
print(f"Top shape scores: indices {np.argsort(ms['shape'])[-3:][::-1]}")
# ── 5. Visualize (optional) ─────────────────────────────────
try:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Panel 1: data with LRT outliers highlighted
ax = axes[0]
for i in range(len(fd)):
color = "red" if lrt["outliers"][i] else "steelblue"
alpha = 1.0 if lrt["outliers"][i] else 0.15
ax.plot(fd.argvals, fd.data[i], color=color, alpha=alpha, linewidth=0.8)
ax.set_title("LRT outliers")
# Panel 2: outliergram
ax = axes[1]
colors = ["red" if o else "steelblue" for o in og["outliers"]]
ax.scatter(og["mei"], og["mbd"], c=colors, s=20)
ax.set_xlabel("MEI")
ax.set_ylabel("MBD")
ax.set_title("Outliergram")
# Panel 3: magnitude vs shape
ax = axes[2]
ax.scatter(ms["magnitude"], ms["shape"], s=20, c="steelblue")
for idx in [0, 1, 2]:
ax.annotate(str(idx), (ms["magnitude"][idx], ms["shape"][idx]),
fontsize=8, color="red")
ax.set_xlabel("Magnitude outlyingness")
ax.set_ylabel("Shape outlyingness")
ax.set_title("Magnitude-Shape plot")
plt.tight_layout()
plt.savefig("outlier_detection.png", dpi=150)
plt.show()
except ImportError:
pass
Which method to use?
- LRT: best all-round choice for magnitude outliers in moderate samples.
- Outliergram: effective for shape outliers; provides an interpretable 2D plot.
- Magnitude-shape: decomposes outlyingness into two axes, useful when you need to distinguish why a curve is outlying.