Depth Functions¶

Depth functions generalize the notion of quantiles and ranks to functional data. A depth measure assigns each curve a real number indicating how "central" it is relative to a reference sample. The deepest curve is the functional median -- a robust location estimator. Curves with low depth are potential outliers.

Concepts¶

Given a sample of curves \(X_1(t), \ldots, X_n(t)\), a functional depth \(D(X_i \mid X_1, \ldots, X_n) \in [0, 1]\) satisfies:

Maximality at center: the depth is maximized at some notion of center.
Monotonicity from center: moving a curve away from the center decreases its depth.
Vanishing at infinity: extreme curves have depth approaching zero.

The functional median is the observation with the largest depth:

\[ \hat{X}_{\mathrm{med}} = X_{i^*}, \quad i^* = \arg\max_i D(X_i \mid X_1, \ldots, X_n) \]

Available depth measures¶

All depth functions live in fdars.depth and share a common interface:

from fdars.depth import fraiman_muniz_1d  # example

depths = fraiman_muniz_1d(data, ref_data)

Parameter	Type	Description
`data`	`np.ndarray` (n, m)	Curves to evaluate
`ref_data`	`np.ndarray` (n_ref, m)	Reference sample (often the same as `data`)

The return value is always a 1D array of length n with depth values.

Self-depth

To rank observations within their own sample, pass the same array as both data and ref_data:

depths = fraiman_muniz_1d(data, data)

Fraiman-Muniz depth¶

from fdars.depth import fraiman_muniz_1d

depths = fraiman_muniz_1d(data, ref_data, scale=True)

Integrates univariate depth (based on the empirical CDF) across the domain:

\[ D_{\mathrm{FM}}(X) = \int_0^1 D_1\bigl(X(t) \mid X_1(t), \ldots, X_n(t)\bigr)\,dt \]

where \(D_1\) is the univariate simplicial depth \(D_1(x) = 2 \min\bigl(F_n(x),\, 1 - F_n(x)\bigr)\).

Parameter	Default	Description
`scale`	`True`	Normalize depth values to \([0, 1]\)

Modified Band Depth¶

from fdars.depth import modified_band_1d

depths = modified_band_1d(data, ref_data)

Measures the proportion of time a curve lies within the band formed by pairs of reference curves. More robust than band depth because it uses the proportion of time inside the band rather than requiring full containment.

\[ D_{\mathrm{MBD}}(X) = \binom{n}{2}^{-1} \sum_{i < j} \lambda\bigl\{t : X_{(i)}(t) \le X(t) \le X_{(j)}(t)\bigr\} \]

where \(\lambda\) denotes the proportion of the domain.

Band Depth¶

from fdars.depth import band_1d

depths = band_1d(data, ref_data)

The "strict" version of modified band depth: a curve gets credit only if it is entirely contained within the band formed by a pair of reference curves. This makes it more sensitive to outlying segments.

Modified Epigraph Index¶

from fdars.depth import modified_epigraph_index_1d

depths = modified_epigraph_index_1d(data, ref_data)

Measures the proportion of curves in the reference sample that lie above the evaluated curve at each time point, integrated over the domain. Useful for detecting magnitude outliers.

Random Projection Depth¶

from fdars.depth import random_projection_1d

depths = random_projection_1d(data, ref_data, n_proj=50)

Projects functional data onto random directions and averages univariate depth over many projections. Computationally efficient and consistent for detecting outliers in high dimensions.

Parameter	Default	Description
`n_proj`	`50`	Number of random projections

Random Tukey Depth¶

from fdars.depth import random_tukey_1d

depths = random_tukey_1d(data, ref_data, n_proj=50)

Similar to random projection depth but uses Tukey (halfspace) depth for each univariate projection. More robust to skewed distributions.

Parameter	Default	Description
`n_proj`	`50`	Number of random projections

from fdars.depth import modal_1d

depths = modal_1d(data, ref_data, h=1.0)

Measures depth based on the local density of curves. The curve at the mode of the distribution has the highest modal depth.

\[ D_{\mathrm{modal}}(X) = \frac{1}{n} \sum_{j=1}^{n} K_h\bigl(\|X - X_j\|\bigr) \]

where \(K_h\) is a kernel function with bandwidth \(h\).

Parameter	Default	Description
`h`	`1.0`	Kernel bandwidth -- smaller values give sharper depth

Bandwidth selection

The bandwidth h strongly affects results. A value too small yields noisy depth values; too large makes all depths similar. Experiment with different values or use the \(L^2\) norm of your data to calibrate.

Functional Spatial Depth¶

from fdars.depth import functional_spatial_1d

depths = functional_spatial_1d(data, ref_data, argvals=None)

Extension of multivariate spatial depth to functions. Based on the average spatial sign function:

\[ D_{\mathrm{sp}}(X) = 1 - \left\| \frac{1}{n} \sum_{j=1}^{n} S(X - X_j) \right\| \]

where \(S(f) = f / \|f\|\) is the spatial sign.

Parameter	Default	Description
`argvals`	`None`	Evaluation points; if `None`, uses a uniform grid on \([0, 1]\)

Kernel Functional Spatial Depth¶

from fdars.depth import kernel_functional_spatial_1d

depths = kernel_functional_spatial_1d(data, ref_data, argvals, h=1.0)

A kernelized version of functional spatial depth that adds local weighting. Useful when the sample has heterogeneous density.

Parameter	Default	Description
`argvals`	(required)	Evaluation points
`h`	`1.0`	Kernel bandwidth

2D variants for surfaces¶

All depth measures that support surfaces (bivariate functional data) have _2d counterparts:

from fdars.depth import (
    fraiman_muniz_2d,
    modal_2d,
    random_projection_2d,
    random_tukey_2d,
    functional_spatial_2d,
    kernel_functional_spatial_2d,
)

For 2D data, data and ref_data are still 2D NumPy arrays of shape (n, m), where each row is a flattened surface observed on a product grid.

Comparison table¶

Depth measure	Speed	Outlier sensitivity	Shape sensitive	Parameters
Fraiman-Muniz	Fast	Moderate	Low	`scale`
Modified Band	Fast	Good	Low	--
Band	Fast	High	Low	--
Modified Epigraph	Fast	Good (magnitude)	Low	--
Random Projection	Moderate	Good	Moderate	`n_proj`
Random Tukey	Moderate	Very good	Moderate	`n_proj`
Modal	Moderate	Excellent	High	`h`
Functional Spatial	Moderate	Good	Moderate	`argvals`
Kernel Func. Spatial	Moderate	Very good	High	`argvals`, `h`

Which depth to choose?

General purpose: Modified Band Depth (MBD) is the most widely used.
Magnitude outliers: Modified Epigraph Index or Fraiman-Muniz.
Shape outliers: Modal depth or Random Tukey.
Skewed distributions: Random Tukey depth handles asymmetry better.
Speed priority: Fraiman-Muniz and MBD scale well with \(n\).

Complete example: functional median and depth-based ordering¶

import numpy as np
import matplotlib.pyplot as plt
from fdars import Fdata
from fdars.simulation import simulate

# --- 1. Simulate data with an outlier ------------------------------------
argvals = np.linspace(0, 1, 150)
data = simulate(n=50, argvals=argvals, n_basis=5, seed=42)

# Inject 2 magnitude outliers
data[0] += 3.0
data[1] -= 2.5
fd = Fdata(data, argvals=argvals)

# --- 2. Compute three different depths ------------------------------------
depths_mbd = fd.depth("modified_band")
depths_fm  = fd.depth("fraiman_muniz")
depths_rt  = fd.depth("random_tukey")

# --- 3. Functional median (deepest curve) ---------------------------------
median_idx = np.argmax(depths_mbd)
print(f"Functional median is curve {median_idx}")

# --- 4. Visualize depth ranking ------------------------------------------
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

for ax, depths, name in zip(axes,
                             [depths_mbd, depths_fm, depths_rt],
                             ["MBD", "Fraiman-Muniz", "Random Tukey"]):
    order = np.argsort(depths)  # low depth first (outliers)

    # Plot all curves, colored by depth
    for i in order:
        c = plt.cm.viridis(depths[i] / depths.max())
        ax.plot(fd.argvals, fd.data[i], color=c, alpha=0.5, lw=0.8)

    # Highlight median
    med = np.argmax(depths)
    ax.plot(fd.argvals, fd.data[med], "r-", lw=2.5, label=f"Median (#{med})")

    # Highlight two outliers
    for out_idx in order[:2]:
        ax.plot(fd.argvals, fd.data[out_idx], "k--", lw=1.5, alpha=0.7)

    ax.set_title(name)
    ax.legend(fontsize=8)

plt.suptitle("Depth-based ordering (bright = deep, dark = outlying)")
plt.tight_layout()
plt.show()

Depth for outlier detection¶

Low depth values flag potential outliers. A common rule uses the boxplot of depth values:

# Outlier detection via depth
q1 = np.percentile(depths_mbd, 25)
iqr = np.percentile(depths_mbd, 75) - q1
threshold = q1 - 1.5 * iqr

outlier_mask = depths_mbd < threshold
print(f"Detected outliers: {np.where(outlier_mask)[0]}")

Formal outlier detection

For production outlier detection, see the Outlier Detection guide which covers the functional boxplot, outliergram, and magnitude-shape plot -- all built on depth functions.

Using depth as features¶

Depth values can serve as features for classification or as weights for robust estimation:

# Weighted mean (depth-weighted, robust to outliers)
weights = depths_mbd / depths_mbd.sum()
robust_mean = np.average(fd.data, axis=0, weights=weights)

API summary¶

Function	Extra parameters	Description
`fraiman_muniz_1d(data, ref_data, scale)`	`scale=True`	Integrated univariate depth
`modified_band_1d(data, ref_data)`	--	Proportion of time inside bands
`band_1d(data, ref_data)`	--	Full containment in bands
`modified_epigraph_index_1d(data, ref_data)`	--	Epigraph-based depth
`random_projection_1d(data, ref_data, n_proj)`	`n_proj=50`	Averaged projected depth
`random_tukey_1d(data, ref_data, n_proj)`	`n_proj=50`	Projected Tukey halfspace depth
`modal_1d(data, ref_data, h)`	`h=1.0`	Kernel-based modal depth
`functional_spatial_1d(data, ref_data, argvals)`	`argvals=None`	Spatial sign depth
`kernel_functional_spatial_1d(data, ref_data, argvals, h)`	`argvals`, `h=1.0`	Kernelized spatial depth

All _1d variants have _2d counterparts for surface data, imported from the same fdars.depth module.