On the stability of canonical correlation analysis and partial least squares with application to brain-behavior associations

Markus Helmer; Shaun Warrington; Ali-Reza Mohammadi-Nejad; Jie Lisa Ji; Amber Howell; Benjamin Rosand; Alan Anticevic; Stamatios N Sotiropoulos; John D Murray

doi:10.1038/s42003-024-05869-4

On the stability of canonical correlation analysis and partial least squares with application to brain-behavior associations

Commun Biol. 2024 Feb 21;7(1):217. doi: 10.1038/s42003-024-05869-4.

Authors

Markus Helmer^{1

2}, Shaun Warrington³, Ali-Reza Mohammadi-Nejad^{3

4}, Jie Lisa Ji^{1

2

5}, Amber Howell^{1

5}, Benjamin Rosand⁶, Alan Anticevic^{1

2

5

7}, Stamatios N Sotiropoulos^{8

9}, John D Murray^{10

11

12

13}

Affiliations

¹ Department of Psychiatry, Yale School of of Medicine, New Haven, CT, 06511, USA.
² Manifest Technologies, New Haven, CT, 06510, USA.
³ Sir Peter Mansfield Imaging Centre, Mental Health and Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, NG7 2UH, United Kingdom.
⁴ National Institute for Health Research (NIHR) Nottingham Biomedical Research Ctr, Queens Medical Ctr, Nottingham, United Kingdom.
⁵ Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, CT, 06511, USA.
⁶ Department of Physics, Yale University, New Haven, CT, 06511, USA.
⁷ Department of Psychology, Yale University, New Haven, CT, 06511, USA.
⁸ Sir Peter Mansfield Imaging Centre, Mental Health and Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, NG7 2UH, United Kingdom. stamatios.sotiropoulos@nottingham.ac.uk.
⁹ National Institute for Health Research (NIHR) Nottingham Biomedical Research Ctr, Queens Medical Ctr, Nottingham, United Kingdom. stamatios.sotiropoulos@nottingham.ac.uk.
¹⁰ Department of Psychiatry, Yale School of of Medicine, New Haven, CT, 06511, USA. john.d.murray@dartmouth.edu.
¹¹ Manifest Technologies, New Haven, CT, 06510, USA. john.d.murray@dartmouth.edu.
¹² Department of Physics, Yale University, New Haven, CT, 06511, USA. john.d.murray@dartmouth.edu.
¹³ Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, 03755, USA. john.d.murray@dartmouth.edu.

PMID: 38383808
DOI: 10.1038/s42003-024-05869-4

Abstract

Associations between datasets can be discovered through multivariate methods like Canonical Correlation Analysis (CCA) or Partial Least Squares (PLS). A requisite property for interpretability and generalizability of CCA/PLS associations is stability of their feature patterns. However, stability of CCA/PLS in high-dimensional datasets is questionable, as found in empirical characterizations. To study these issues systematically, we developed a generative modeling framework to simulate synthetic datasets. We found that when sample size is relatively small, but comparable to typical studies, CCA/PLS associations are highly unstable and inaccurate; both in their magnitude and importantly in the feature pattern underlying the association. We confirmed these trends across two neuroimaging modalities and in independent datasets with n ≈ 1000 and n = 20,000, and found that only the latter comprised sufficient observations for stable mappings between imaging-derived and behavioral features. We further developed a power calculator to provide sample sizes required for stability and reliability of multivariate analyses. Collectively, we characterize how to limit detrimental effects of overfitting on CCA/PLS stability, and provide recommendations for future studies.

MeSH terms

Algorithms*
Brain / diagnostic imaging
Canonical Correlation Analysis*
Least-Squares Analysis
Reproducibility of Results