Performs k-fold cross-validation for any functional data model, producing
out-of-fold (OOF) predictions where each observation is predicted exactly
once (when it is in the test fold). Supports both regression and
classification, with optional stratified fold assignment. Repeated
cross-validation (nrep > 1) runs multiple random fold partitions
to assess prediction variability.
Usage
cv.fdata(
fdataobj,
y,
fit.fn,
predict.fn = NULL,
kfold = 10,
nrep = 1,
type = c("regression", "classification"),
stratified = TRUE,
seed = NULL,
...
)Arguments
- fdataobj
An
fdataobject containing functional predictors.- y
Response vector: numeric for regression, factor (or coerced to factor) for classification.
- fit.fn
A function with signature
function(fdataobj, y, ...)that returns a fitted model object.- predict.fn
Optional prediction function with signature
function(model, newdata, ...). IfNULL(default), usespredict(model, newdata).- kfold
Number of folds (default 10).
- nrep
Number of repetitions (default 1). When
nrep > 1, the entire k-fold procedure is repeated with different random fold assignments, producing ann x nrepmatrix of predictions and per-observation variability estimates.- type
One of
"regression"or"classification". Auto-detected fromyif it is a factor.- stratified
Logical; whether to stratify fold assignments so each fold has a similar distribution of
y(defaultTRUE).- seed
Optional integer seed for reproducibility of fold assignments.
- ...
Additional arguments passed through to
fit.fn.
Value
An object of class "cv.fdata" with components:
- oof.predictions
Out-of-fold predictions (numeric vector for regression, factor for classification). When
nrep > 1, aggregated across repetitions (mean for regression, majority vote for classification).- oof.probabilities
For classification only: matrix of class probabilities if
predict.fnreturns them; otherwiseNULL.- folds
Integer vector of fold assignments (from rep 1 for backward compatibility).
- fold.models
List of per-fold fitted model objects. When
nrep > 1, a list ofnreplists, each of lengthkfold.- metrics
Named list of overall performance metrics (computed on aggregated predictions when
nrep > 1).- fold.metrics
Data frame of per-fold metrics (from rep 1 for backward compatibility).
- call
The matched call.
- type
Character:
"regression"or"classification".- kfold
Number of folds used.
- y
The response vector.
- nrep
Number of repetitions (only when
nrep > 1).- oof.matrix
Matrix (
n x nrep) of per-repetition predictions (only whennrep > 1).- oof.sd
Numeric vector of per-observation prediction SD across repetitions (only when
nrep > 1). For classification, the proportion of reps disagreeing with the majority vote.- folds.matrix
Integer matrix (
n x nrep) of fold assignments per repetition (only whennrep > 1).- rep.metrics
Data frame with one row per repetition containing per-rep metrics (only when
nrep > 1).- metrics.summary
List with mean and sd of each metric across repetitions (only when
nrep > 1).
Details
The fit.fn can internally perform nested cross-validation for
hyperparameter tuning. For example, passing a wrapper around
fregre.pc.cv as fit.fn gives proper nested CV: outer
folds produce unbiased OOF predictions while inner CV selects optimal
parameters on training data only.
Each fold is wrapped in tryCatch: if a fold fails, its predictions
are set to NA and a warning is issued, but the remaining folds
continue.
When nrep > 1, each repetition uses a different random fold partition.
If seed is provided, deterministic per-rep seeds are derived from it
for full reproducibility.
Examples
# Simple regression CV with fixed hyperparameters
set.seed(1)
t <- seq(0, 1, length.out = 30)
X <- matrix(0, 40, 30)
for (i in 1:40) X[i, ] <- sin(2*pi*t * i/40) + rnorm(30, sd = 0.1)
y <- rowMeans(X) + rnorm(40, sd = 0.1)
fd <- fdata(X, argvals = t)
result <- cv.fdata(fd, y,
fit.fn = function(fd, y, ...) fregre.pc(fd, y, ncomp = 3),
kfold = 5, seed = 42)
print(result)
#> K-Fold Cross-Validation (cv.fdata)
#> Type: regression
#> Folds: 5
#> Observations: 40
#>
#> Overall metrics:
#> RMSE: 0.1117
#> MAE: 0.08839
#> R2: 0.8551
#>
#> Per-fold RMSE range: [0.05177, 0.1356]