Skip to contents

Functions for clustering functional data, including k-means and related algorithms. Functional K-Means Clustering

Usage

cluster.kmeans(
  fdataobj,
  ncl,
  metric = "L2",
  max.iter = 100,
  nstart = 10,
  seed = NULL,
  draw = FALSE,
  ...
)

Arguments

fdataobj

An object of class 'fdata'.

ncl

Number of clusters.

metric

Either a string ("L2", "L1", "Linf") for fast Rust-based distance computation, or a metric/semimetric function (e.g., metric.lp, metric.hausdorff, semimetric.pca). Using a function provides flexibility but may be slower for semimetrics computed in R.

max.iter

Maximum number of iterations (default 100).

nstart

Number of random starts (default 10). The best result (lowest within-cluster sum of squares) is returned.

seed

Optional random seed for reproducibility.

draw

Logical. If TRUE, plot the clustered curves (not yet implemented).

...

Additional arguments passed to the metric function.

Value

A list of class 'cluster.kmeans' with components:

cluster

Integer vector of cluster assignments (1 to ncl).

centers

An fdata object containing the cluster centers.

withinss

Within-cluster sum of squares for each cluster.

tot.withinss

Total within-cluster sum of squares.

size

Number of observations in each cluster.

fdataobj

The input functional data object.

Details

Performs k-means clustering on functional data using the specified metric. Uses k-means++ initialization for better initial centers.

When metric is a string ("L2", "L1", "Linf"), the entire k-means algorithm runs in Rust with parallel processing, providing 50-200x speedup.

When metric is a function, distances are computed using that function. Functions like metric.lp, metric.hausdorff, and metric.DTW have Rust backends and remain fast. Semimetric functions (semimetric.*) are computed in R and will be slower for large datasets.

Examples

# Create functional data with two groups
t <- seq(0, 1, length.out = 50)
n <- 30
X <- matrix(0, n, 50)
true_cluster <- rep(1:2, each = 15)
for (i in 1:n) {
  if (true_cluster[i] == 1) {
    X[i, ] <- sin(2*pi*t) + rnorm(50, sd = 0.1)
  } else {
    X[i, ] <- cos(2*pi*t) + rnorm(50, sd = 0.1)
  }
}
fd <- fdata(X, argvals = t)

# Cluster with string metric (fast Rust path)
result <- cluster.kmeans(fd, ncl = 2, metric = "L2")
table(result$cluster, true_cluster)
#>    true_cluster
#>      1  2
#>   1 15  0
#>   2  0 15

# Cluster with metric function (also fast - Rust backend)
result2 <- cluster.kmeans(fd, ncl = 2, metric = metric.lp)

# Cluster with semimetric (flexible but slower)
result3 <- cluster.kmeans(fd, ncl = 2, metric = semimetric.pca, ncomp = 3)