Skip to contents

A climate analyst wants to characterize how geography shapes seasonal temperature and precipitation patterns across Canada. Functional ANOVA tests whether climate curves differ by region, while function-on-scalar regression reveals which geographic variables (latitude, longitude) drive seasonal variation.

Step What It Does Outcome
Data preparation Load 35 stations × 365 daily temperature/precipitation curves fdata objects colored by 4 regions
FANOVA (temperature) Permutation test for regional mean differences Strong rejection: regions differ significantly
Pairwise FANOVA Test all region pairs Pacific/Arctic most distinct; Atlantic/Continental overlap in summer
FOSR: temperature ~ geography Regress temperature curves on latitude + longitude Latitude dominates in winter; longitude effect peaks in spring/fall
Precipitation analysis FANOVA + FOSR on precipitation curves Regions also differ; Pacific has distinct wet-season pattern
Classification Classify region from temperature curves Cross-validated accuracy by region

Key result: Latitude is the strongest predictor of temperature seasonality — its effect is largest in winter when Arctic/Continental stations drop far below Atlantic/Pacific ones. The permutation FANOVA confirms highly significant regional differences (p < 0.002).

The Canadian Weather dataset is a classic benchmark in functional data analysis (Ramsay and Silverman, 2005).

Data Preparation

The dataset contains daily temperature and precipitation averaged over 1960–1994 at 35 Canadian weather stations, grouped into 4 regions.

data(CanadianWeather, package = "fda")

# Temperature: 365 x 35 matrix (days x stations) — transpose for fdata
temp_mat <- t(CanadianWeather$dailyAv[, , "Temperature.C"])
fd_temp <- fdata(temp_mat, argvals = 1:365)

# Precipitation (log mm)
precip_mat <- t(CanadianWeather$dailyAv[, , "log10precip"])
fd_precip <- fdata(precip_mat, argvals = 1:365)

# Region labels
region <- factor(CanadianWeather$region)

# Geographic coordinates
coords <- CanadianWeather$coordinates
latitude <- coords[, "N.latitude"]
longitude <- -abs(coords[, "W.longitude"])  # negative for West

cat("Stations:", nrow(fd_temp$data), "\n")
#> Stations: 35
cat("Grid points:", ncol(fd_temp$data), "(daily)\n")
#> Grid points: 365 (daily)
cat("Regions:", paste(levels(region), table(region), sep = ": ", collapse = ", "), "\n")
#> Regions: Arctic: 3, Atlantic: 15, Continental: 12, Pacific: 5

Temperature Curves by Region

region_colors <- c("Arctic" = "#56B4E9", "Atlantic" = "#E69F00",
                    "Continental" = "#009E73", "Pacific" = "#CC79A7")

# Create data frame for ggplot
temp_df <- data.frame(
  Day = rep(1:365, each = nrow(fd_temp$data)),
  Temp = as.vector(t(fd_temp$data)),
  Station = rep(rownames(fd_temp$data), 365),
  Region = rep(as.character(region), 365)
)

ggplot(temp_df, aes(x = Day, y = Temp, group = Station, color = Region)) +
  geom_line(alpha = 0.6) +
  scale_color_manual(values = region_colors) +
  labs(title = "Daily Temperature by Region (35 Canadian Stations)",
       x = "Day of year", y = "Temperature (°C)") +
  scale_x_continuous(breaks = c(1, 91, 182, 274),
                     labels = c("Jan", "Apr", "Jul", "Oct"))

Arctic and Continental stations show the widest annual swing (cold winters, warm summers), while Pacific stations have the mildest climate. The key question: are these regional differences statistically significant across the whole year?

Functional ANOVA: Do Regions Differ?

anova_temp <- fanova(fd_temp, region, n.perm = 500)
print(anova_temp)
#> Functional ANOVA
#> ================
#>   Number of groups: 4 
#>   Number of observations: 35 
#>   Global F-statistic: 22.5107 
#>   P-value: 0.001996 
#>   Permutations: 500

The small p-value confirms that the four regions have significantly different mean temperature curves — the visual separation is real, not a sampling artifact.

plot(anova_temp) +
  scale_x_continuous(breaks = c(1, 91, 182, 274),
                     labels = c("Jan", "Apr", "Jul", "Oct")) +
  labs(title = "FANOVA: Regional Mean Temperature Curves",
       x = "Day of year", y = "Temperature (°C)")

The group means show the expected pattern: Arctic coldest year-round, Continental with the largest amplitude, Pacific mildest, Atlantic intermediate. The separation is greatest in winter (days 1–90 and 275–365).

Pairwise FANOVA

Which specific region pairs differ?

# Test each pair of regions
pairs <- combn(levels(region), 2)
pairwise_results <- data.frame(
  Region1 = character(), Region2 = character(),
  F_stat = numeric(), p_value = numeric(),
  stringsAsFactors = FALSE
)

for (j in 1:ncol(pairs)) {
  idx <- region %in% pairs[, j]
  result <- fanova(fd_temp[idx, ], region[idx, drop = TRUE], n.perm = 500)
  pairwise_results <- rbind(pairwise_results, data.frame(
    Region1 = pairs[1, j], Region2 = pairs[2, j],
    F_stat = round(result$global.statistic, 2),
    p_value = result$p.value
  ))
}

knitr::kable(pairwise_results, caption = "Pairwise FANOVA p-values")
Pairwise FANOVA p-values
Region1 Region2 F_stat p_value
Arctic Atlantic 51.83 0.0019960
Arctic Continental 18.99 0.0059880
Arctic Pacific 55.39 0.0239521
Atlantic Continental 13.19 0.0019960
Atlantic Pacific 5.61 0.0079840
Continental Pacific 15.54 0.0019960

Arctic vs Pacific shows the largest F-statistic (greatest separation). Continental vs Atlantic may overlap more, particularly in summer when both regions reach similar peak temperatures.

FOSR: Temperature ~ Latitude + Longitude

Function-on-scalar regression reveals how geographic variables shape the temperature curve at each day of the year.

predictors <- cbind(latitude, longitude)
fosr_temp <- fosr(fd_temp, predictors, lambda = 1)
print(fosr_temp)
#> Function-on-Scalar Regression
#> =============================
#>   Number of observations: 35 
#>   Number of predictors: 2 
#>   Evaluation points: 365 
#>   R-squared: 0.466 
#>   Lambda: 1
plot(fosr_temp) +
  scale_x_continuous(breaks = c(1, 91, 182, 274),
                     labels = c("Jan", "Apr", "Jul", "Oct"))

Interpreting the Coefficient Functions

  • Latitude β̂1(t)\hat{\beta}_1(t): Negative throughout the year (higher latitude → colder). The effect is strongest in winter — a 1° increase in latitude has a bigger cooling effect in January than in July.
  • Longitude β̂2(t)\hat{\beta}_2(t): More subtle. Western stations (more negative longitude) tend to be warmer in winter (Pacific influence) but the effect reverses or diminishes in summer.

FPC-Based FOSR

fosr_fpc <- fosr.fpc(fd_temp, predictors, ncomp = 5)
cat("FPC-based R-squared:", round(fosr_fpc$r.squared, 4), "\n")
#> FPC-based R-squared: 0.8114

Prediction: What Would a Station at Given Coordinates Look Like?

# Predict temperature curve for hypothetical stations
new_locs <- matrix(c(
  45, -75,   # Montreal-like (Continental)
  55, -120,  # Northern BC (Continental/Pacific)
  65, -135   # Yukon (Arctic)
), nrow = 3, byrow = TRUE)

pred_curves <- predict(fosr_temp, new_locs)

pred_df <- data.frame(
  Day = rep(1:365, 3),
  Temp = as.vector(t(pred_curves$data)),
  Location = rep(c("45°N, 75°W (Montreal-like)",
                    "55°N, 120°W (Northern BC)",
                    "65°N, 135°W (Yukon)"), each = 365)
)

ggplot(pred_df, aes(x = Day, y = Temp, color = Location)) +
  geom_line(linewidth = 1) +
  scale_x_continuous(breaks = c(1, 91, 182, 274),
                     labels = c("Jan", "Apr", "Jul", "Oct")) +
  labs(title = "Predicted Temperature Curves by Location",
       x = "Day of year", y = "Temperature (°C)")

Precipitation Analysis

FANOVA on Precipitation

anova_precip <- fanova(fd_precip, region, n.perm = 500)
print(anova_precip)
#> Functional ANOVA
#> ================
#>   Number of groups: 4 
#>   Number of observations: 35 
#>   Global F-statistic: 14.0489 
#>   P-value: 0.001996 
#>   Permutations: 500
plot(anova_precip) +
  scale_x_continuous(breaks = c(1, 91, 182, 274),
                     labels = c("Jan", "Apr", "Jul", "Oct")) +
  labs(title = "FANOVA: Regional Mean Precipitation Curves",
       x = "Day of year", y = "log10(precipitation, mm)")

Pacific stations show a distinct wet-winter/dry-summer Mediterranean-like pattern, while Continental stations have summer-dominated precipitation.

FOSR on Precipitation

fosr_precip <- fosr(fd_precip, predictors, lambda = 1)
cat("Precipitation FOSR R-squared:", round(fosr_precip$r.squared, 4), "\n")
#> Precipitation FOSR R-squared: 0.2542

Classification: Region from Temperature

Can we classify a station’s region from its temperature curve alone?

# Functional classification with cross-validation
classif_result <- fclassif.cv(fd_temp, region, nfold = 5)
print(classif_result)
#> Cross-Validated Functional Classification
#> ==========================================
#>   Method: LDA 
#>   Folds: 5 
#>   Error rate: 0.1429 
#>   Best ncomp: 3
# Also fit on full data to see confusion matrix
classif_full <- fclassif(fd_temp, region)

cat("Overall accuracy:", round(classif_full$accuracy * 100, 1), "%\n")
#> Overall accuracy: 88.6 %
print(classif_full$confusion)
#>      [,1] [,2] [,3] [,4]
#> [1,]    3    0    0    0
#> [2,]    0   14    1    0
#> [3,]    1    1    9    1
#> [4,]    0    0    0    5

Pacific and Arctic stations are typically classified correctly due to their distinctive temperature profiles. Misclassification is most common between Atlantic and Continental stations, whose summer temperatures overlap.

Conclusions

  • FANOVA confirms highly significant regional differences in both temperature and precipitation curves (p < 0.002 in both cases).
  • Latitude is the dominant geographic predictor of temperature seasonality. Its effect is strongest in winter, when the Arctic/Continental vs Atlantic/Pacific divide is most pronounced.
  • Longitude has a more subtle effect, primarily reflecting the moderating influence of the Pacific Ocean on western stations.
  • FOSR with 2 scalar predictors explains a substantial fraction of the curve-to-curve temperature variation — geography alone goes a long way.
  • Classification from temperature curves correctly identifies most regions, with Pacific and Arctic being the most distinctive.

See Also

References

  • Ramsay, J.O. and Silverman, B.W. (2005). Functional Data Analysis, 2nd ed. Springer. Chapter 13 (Canadian Weather example).
  • Cuevas, A., Febrero, M., and Fraiman, R. (2004). An anova test for functional data. Computational Statistics & Data Analysis, 47(1), 111-122.