fdars.clustering
Clustering methods for functional data with cluster quality metrics.
Functions
kmeans_fd
fdars.kmeans_fd(data, argvals, k, max_iter=100, tol=1e-6, seed=42)
K-means clustering for functional data using L2 distance.
| Parameter |
Type |
Default |
Description |
data |
ndarray (n, m) |
|
Functional data |
argvals |
ndarray (m,) |
|
Evaluation points |
k |
int |
|
Number of clusters |
max_iter |
int |
100 |
Maximum iterations |
tol |
float |
1e-6 |
Convergence tolerance |
seed |
int |
42 |
Random seed |
| Returns |
Type |
Description |
| result |
dict |
Keys: cluster (n,), centers (k, m), tot_withinss, iter, converged |
t = np.linspace(0, 1, 100)
result = fdars.kmeans_fd(data, t, k=3)
labels = result["cluster"]
fuzzy_cmeans_fd
fdars.fuzzy_cmeans_fd(data, argvals, k, fuzziness=2.0, max_iter=100,
tol=1e-6, seed=42)
Fuzzy C-means clustering for functional data.
| Parameter |
Type |
Default |
Description |
data |
ndarray (n, m) |
|
Functional data |
argvals |
ndarray (m,) |
|
Evaluation points |
k |
int |
|
Number of clusters |
fuzziness |
float |
2.0 |
Fuzziness parameter (m > 1) |
max_iter |
int |
100 |
Maximum iterations |
tol |
float |
1e-6 |
Convergence tolerance |
seed |
int |
42 |
Random seed |
| Returns |
Type |
Description |
| result |
dict |
Keys: cluster (n,), membership (n, k), centers (k, m) |
result = fdars.fuzzy_cmeans_fd(data, t, k=3, fuzziness=2.0)
soft_labels = result["membership"] # shape (n, 3)
gmm_cluster
fdars.gmm_cluster(data, argvals, k_range, nbasis=5, max_iter=200,
tol=1e-6, seed=42)
Gaussian mixture model clustering via basis projection. Tries multiple values of k and selects the best by BIC.
| Parameter |
Type |
Default |
Description |
data |
ndarray (n, m) |
|
Functional data |
argvals |
ndarray (m,) |
|
Evaluation points |
k_range |
list[int] |
|
List of cluster counts to try |
nbasis |
int |
5 |
Number of basis functions for projection |
max_iter |
int |
200 |
Maximum EM iterations |
tol |
float |
1e-6 |
Convergence tolerance |
seed |
int |
42 |
Random seed |
| Returns |
Type |
Description |
| result |
dict |
Keys: cluster (n,), membership (n, k), bic_values (list of (k, bic)), icl_values (list of (k, icl)) |
result = fdars.gmm_cluster(data, t, k_range=[2, 3, 4, 5])
best_labels = result["cluster"]
silhouette_score
fdars.silhouette_score(dist_matrix, labels)
Compute per-observation silhouette scores from a precomputed distance matrix.
| Parameter |
Type |
Description |
dist_matrix |
ndarray (n, n) |
Pairwise distance matrix |
labels |
ndarray (n,) of int64 |
Cluster assignments |
| Returns |
Type |
Description |
| scores |
ndarray (n,) |
Per-observation silhouette scores in [-1, 1] |
D = fdars.lp_self_1d(data, t)
scores = fdars.silhouette_score(D, result["cluster"].astype(np.int64))
print(f"Mean silhouette: {scores.mean():.3f}")
calinski_harabasz
fdars.calinski_harabasz(dist_matrix, labels)
Calinski-Harabasz index (variance ratio criterion) from a precomputed distance matrix. Higher values indicate better-defined clusters.
| Parameter |
Type |
Description |
dist_matrix |
ndarray (n, n) |
Pairwise distance matrix |
labels |
ndarray (n,) of int64 |
Cluster assignments |
| Returns |
Type |
Description |
| score |
float |
Calinski-Harabasz score |
ch = fdars.calinski_harabasz(D, result["cluster"].astype(np.int64))