Equivalence Testing¶
Classical hypothesis tests ask "are these two groups different?" Equivalence testing flips the question: are these two groups similar enough to be considered practically the same?
This is critical in manufacturing (batch-to-batch consistency), bioequivalence studies (generic vs. brand-name drugs), and any domain where you need to demonstrate that a change or substitution has no meaningful effect on the functional response.
The TOST framework¶
The functional equivalence test in fdars implements a Two One-Sided Tests (TOST) procedure adapted for functional data:
- Define an equivalence margin \(\delta > 0\).
- Test \(H_0^-: \|\mu_1 - \mu_2\|_\infty \ge \delta\) against \(H_1^-: \|\mu_1 - \mu_2\|_\infty < \delta\).
- If \(H_0^-\) is rejected at level \(\alpha\), the two groups are declared equivalent within margin \(\delta\).
The null distribution of the test statistic is estimated via a Gaussian multiplier bootstrap.
Equivalence is concluded when \(T < \delta - c_\alpha\), where \(c_\alpha\) is the \((1-\alpha)\) quantile from the bootstrap.
Usage¶
import numpy as np
from fdars import Fdata
from fdars.simulation import simulate
from fdars.tolerance import equivalence_test
argvals = np.linspace(0, 1, 100)
# Two groups with very similar means
fd_a = Fdata(simulate(50, argvals, n_basis=5, seed=1), argvals=argvals)
fd_b = Fdata(simulate(50, argvals, n_basis=5, seed=2) + 0.2, argvals=argvals) # small offset
result = equivalence_test(
data1=fd_a.data,
data2=fd_b.data,
delta=1.0, # equivalence margin
alpha=0.05, # significance level
nb=1000, # bootstrap replicates
seed=42,
)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data1 |
ndarray (n1, m) |
-- | First group of functional observations |
data2 |
ndarray (n2, m) |
-- | Second group of functional observations |
delta |
float |
-- | Equivalence margin (\(\delta > 0\)) |
alpha |
float |
0.05 |
Significance level |
nb |
int |
1000 |
Number of bootstrap replicates |
seed |
int |
42 |
Random seed |
Returns a dictionary:
| Key | Type | Description |
|---|---|---|
equivalent |
bool |
True if equivalence is established at level \(\alpha\) |
p_value |
float |
Bootstrap p-value |
test_statistic |
float |
Observed sup-norm of the mean difference |
print(f"Equivalent: {result['equivalent']}")
print(f"p-value: {result['p_value']:.4f}")
print(f"Sup-norm: {result['test_statistic']:.4f}")
Choosing the margin \(\delta\)¶
The margin \(\delta\) is the maximum allowable pointwise difference between the two mean functions. It should be set before looking at the data, based on domain knowledge:
Do not choose \(\delta\) from the data
Setting \(\delta\) to be just larger than the observed difference inflates the Type I error. Always specify \(\delta\) based on what constitutes a practically meaningful difference in your application.
| Domain | Typical \(\delta\) guidance |
|---|---|
| Manufacturing | Specification tolerance / 2 |
| Bioequivalence | 20 % of the reference mean (FDA guidance) |
| Environmental monitoring | Regulatory action threshold |
Example -- equivalent vs. non-equivalent groups¶
import numpy as np
from fdars import Fdata
from fdars.simulation import simulate
from fdars.tolerance import equivalence_test
argvals = np.linspace(0, 1, 100)
delta = 1.0
# ── Case 1: Similar groups (should be equivalent) ────────────
fd_a = Fdata(simulate(40, argvals, n_basis=5, seed=10), argvals=argvals)
fd_b = Fdata(simulate(40, argvals, n_basis=5, seed=20) + 0.1, argvals=argvals)
r1 = equivalence_test(fd_a.data, fd_b.data, delta=delta, alpha=0.05, nb=2000, seed=42)
print(f"Case 1 — Equivalent: {r1['equivalent']} p={r1['p_value']:.4f}")
# ── Case 2: Different groups (should NOT be equivalent) ──────
fd_c = Fdata(simulate(40, argvals, n_basis=5, seed=10), argvals=argvals)
fd_d = Fdata(simulate(40, argvals, n_basis=5, seed=20) + 5.0, argvals=argvals) # large shift
r2 = equivalence_test(fd_c.data, fd_d.data, delta=delta, alpha=0.05, nb=2000, seed=42)
print(f"Case 2 — Equivalent: {r2['equivalent']} p={r2['p_value']:.4f}")
Expected output:
Sensitivity to \(\delta\)¶
You can sweep over a range of margins to understand how sensitive the conclusion is:
import numpy as np
from fdars import Fdata
from fdars.simulation import simulate
from fdars.tolerance import equivalence_test
argvals = np.linspace(0, 1, 100)
fd_a = Fdata(simulate(50, argvals, n_basis=5, seed=1), argvals=argvals)
fd_b = Fdata(simulate(50, argvals, n_basis=5, seed=2) + 0.3, argvals=argvals)
for delta in [0.2, 0.5, 1.0, 2.0, 5.0]:
r = equivalence_test(fd_a.data, fd_b.data, delta=delta, nb=1000, seed=42)
status = "equivalent" if r["equivalent"] else "not equivalent"
print(f"delta={delta:.1f} T={r['test_statistic']:.3f} p={r['p_value']:.3f} -> {status}")