Skip to content

Outlier Detection

Functional outliers come in three flavours:

Type Description Example
Magnitude The curve lies far above or below the bulk of the data A temperature sensor reading 20 degrees higher than all others
Shape The curve has an unusual pattern even if its overall level is normal A growth curve that dips where all others rise
Amplitude The curve has exaggerated peaks and troughs A vibration signal with double the usual amplitude

fdars provides three complementary methods that target different outlier types.


LRT-based detection

A likelihood-ratio test approach that compares the likelihood of the data with and without each candidate outlier. A bootstrap procedure determines the rejection threshold.

import numpy as np
from fdars import Fdata
from fdars.simulation import simulate
from fdars.outliers import detect_outliers_lrt

argvals = np.linspace(0, 1, 100)
fd = Fdata(simulate(50, argvals, n_basis=5, seed=1), argvals=argvals)

# Inject two magnitude outliers
fd.data[0] += 8.0
fd.data[1] -= 8.0

result = detect_outliers_lrt(fd.data, alpha=0.05, n_bootstrap=200, trim=0.1, smo=0.02)

Parameters

Parameter Type Default Description
data ndarray (n, m) -- Functional observations
alpha float 0.05 Significance level
n_bootstrap int 200 Number of bootstrap replicates for threshold estimation
trim float 0.1 Trimming proportion for the robust mean
smo float 0.02 Smoothing parameter for the likelihood ratio

Returns a dictionary:

Key Type Description
outliers ndarray (n,) bool True for each detected outlier
threshold float Computed rejection threshold
outlier_ids = np.where(result["outliers"])[0]
print(f"Outlier indices: {outlier_ids}")
print(f"Threshold: {result['threshold']:.4f}")

Outliergram (MEI vs MBD)

The outliergram plots the Modified Epigraph Index (MEI) against the Modified Band Depth (MBD) for every curve. Points that fall far from the parabolic relationship \(\mathrm{MBD} = a_0 + a_1 \cdot \mathrm{MEI} + a_2 \cdot \mathrm{MEI}^2\) are flagged as shape outliers.

from fdars.outliers import outliergram

result_og = outliergram(fd.data, factor=1.5)

Parameters

Parameter Type Default Description
data ndarray (n, m) -- Functional observations
factor float 1.5 Outlier factor (analogous to the IQR multiplier in a boxplot)

Returns a dictionary:

Key Shape Description
mei (n,) Modified Epigraph Index
mbd (n,) Modified Band Depth
outliers (n,) bool Outlier flags

Choosing the factor

A factor of 1.5 (the default) mirrors the classic boxplot rule. Increase it to 2.0 or 3.0 if you want to be more conservative and only flag extreme shape departures.


Magnitude-shape outlyingness

This method decomposes each observation's outlyingness into a magnitude component and a shape component using the directional outlyingness framework. It is particularly effective at detecting curves that are unusual in shape even when their overall level is normal.

from fdars.outliers import magnitude_shape

result_ms = magnitude_shape(fd.data)

Parameters

Parameter Type Default Description
data ndarray (n, m) -- Functional observations

Returns a dictionary:

Key Shape Description
magnitude (n,) Magnitude outlyingness score for each curve
shape (n,) Shape outlyingness score for each curve

You can flag outliers by thresholding either component (e.g., values above the 97.5th percentile):

mag_threshold = np.percentile(result_ms["magnitude"], 97.5)
shape_threshold = np.percentile(result_ms["shape"], 97.5)
mag_outliers = result_ms["magnitude"] > mag_threshold
shape_outliers = result_ms["shape"] > shape_threshold
print(f"Magnitude outliers: {np.where(mag_outliers)[0]}")
print(f"Shape outliers:     {np.where(shape_outliers)[0]}")

Full example -- detect and visualize outliers

import numpy as np
from fdars import Fdata
from fdars.simulation import simulate
from fdars.outliers import detect_outliers_lrt, outliergram, magnitude_shape

# ── 1. Generate clean data + outliers ─────────────────────────
argvals = np.linspace(0, 1, 100)
fd = Fdata(simulate(50, argvals, n_basis=5, seed=42), argvals=argvals)

# Magnitude outlier
fd.data[0] += 7.0

# Shape outlier (reversed curve)
fd.data[1] = -fd.data[1]

# Amplitude outlier (exaggerated)
fd.data[2] *= 3.0

# ── 2. LRT detection ─────────────────────────────────────────
lrt = detect_outliers_lrt(fd.data, alpha=0.05, n_bootstrap=200)
print("LRT outliers:", np.where(lrt["outliers"])[0])

# ── 3. Outliergram ───────────────────────────────────────────
og = outliergram(fd.data, factor=1.5)
print("Outliergram outliers:", np.where(og["outliers"])[0])

# ── 4. Magnitude-shape ──────────────────────────────────────
ms = magnitude_shape(fd.data)
print(f"Top magnitude scores: indices {np.argsort(ms['magnitude'])[-3:][::-1]}")
print(f"Top shape scores:     indices {np.argsort(ms['shape'])[-3:][::-1]}")

# ── 5. Visualize (optional) ─────────────────────────────────
try:
    import matplotlib.pyplot as plt

    fig, axes = plt.subplots(1, 3, figsize=(15, 4))

    # Panel 1: data with LRT outliers highlighted
    ax = axes[0]
    for i in range(len(fd)):
        color = "red" if lrt["outliers"][i] else "steelblue"
        alpha = 1.0 if lrt["outliers"][i] else 0.15
        ax.plot(fd.argvals, fd.data[i], color=color, alpha=alpha, linewidth=0.8)
    ax.set_title("LRT outliers")

    # Panel 2: outliergram
    ax = axes[1]
    colors = ["red" if o else "steelblue" for o in og["outliers"]]
    ax.scatter(og["mei"], og["mbd"], c=colors, s=20)
    ax.set_xlabel("MEI")
    ax.set_ylabel("MBD")
    ax.set_title("Outliergram")

    # Panel 3: magnitude vs shape
    ax = axes[2]
    ax.scatter(ms["magnitude"], ms["shape"], s=20, c="steelblue")
    for idx in [0, 1, 2]:
        ax.annotate(str(idx), (ms["magnitude"][idx], ms["shape"][idx]),
                    fontsize=8, color="red")
    ax.set_xlabel("Magnitude outlyingness")
    ax.set_ylabel("Shape outlyingness")
    ax.set_title("Magnitude-Shape plot")

    plt.tight_layout()
    plt.savefig("outlier_detection.png", dpi=150)
    plt.show()
except ImportError:
    pass

Which method to use?

  • LRT: best all-round choice for magnitude outliers in moderate samples.
  • Outliergram: effective for shape outliers; provides an interpretable 2D plot.
  • Magnitude-shape: decomposes outlyingness into two axes, useful when you need to distinguish why a curve is outlying.