Canonical Correlation Analysis and Partial Least Squares for Identifying Brain-Behavior Associations: A Tutorial and a Comparative Study

Biol Psychiatry Cogn Neurosci Neuroimaging. 2022 Nov;7(11):1055-1067. doi: 10.1016/j.bpsc.2022.07.012. Epub 2022 Aug 8.

Abstract

Canonical correlation analysis (CCA) and partial least squares (PLS) are powerful multivariate methods for capturing associations across 2 modalities of data (e.g., brain and behavior). However, when the sample size is similar to or smaller than the number of variables in the data, standard CCA and PLS models may overfit, i.e., find spurious associations that generalize poorly to new data. Dimensionality reduction and regularized extensions of CCA and PLS have been proposed to address this problem, yet most studies using these approaches have some limitations. This work gives a theoretical and practical introduction into the most common CCA/PLS models and their regularized variants. We examine the limitations of standard CCA and PLS when the sample size is similar to or smaller than the number of variables. We discuss how dimensionality reduction and regularization techniques address this problem and explain their main advantages and disadvantages. We highlight crucial aspects of the CCA/PLS analysis framework, including optimizing the hyperparameters of the model and testing the identified associations for statistical significance. We apply the described CCA/PLS models to simulated data and real data from the Human Connectome Project and Alzheimer's Disease Neuroimaging Initiative (both of n > 500). We use both low- and high-dimensionality versions of these data (i.e., ratios between sample size and variables in the range of ∼1-10 and ∼0.1-0.01, respectively) to demonstrate the impact of data dimensionality on the models. Finally, we summarize the key lessons of the tutorial.

Keywords: Brain–behavior association; CCA; High-dimensional data; Overfitting; PLS; Regularization.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Brain
  • Canonical Correlation Analysis*
  • Connectome*
  • Humans
  • Least-Squares Analysis