Penalized decomposition using residuals (PeDecURe) for feature extraction in the presence of nuisance variables

Biostatistics. 2023 Jul 14;24(3):653-668. doi: 10.1093/biostatistics/kxac031.

Abstract

Neuroimaging data are an increasingly important part of etiological studies of neurological and psychiatric disorders. However, mitigating the influence of nuisance variables, including confounders, remains a challenge in image analysis. In studies of Alzheimer's disease, for example, an imbalance in disease rates by age and sex may make it difficult to distinguish between structural patterns in the brain (as measured by neuroimaging scans) attributable to disease progression and those characteristic of typical human aging or sex differences. Concerningly, when not properly accounted for, nuisance variables pose threats to the generalizability and interpretability of findings from these studies. Motivated by this critical issue, in this work, we examine the impact of nuisance variables on feature extraction methods and propose Penalized Decomposition Using Residuals (PeDecURe), a new method for obtaining nuisance variable-adjusted features. PeDecURe estimates primary directions of variation which maximize covariance between partially residualized imaging features and a variable of interest (e.g., Alzheimer's diagnosis) while simultaneously mitigating the influence of nuisance variation through a penalty on the covariance between partially residualized imaging features and those variables. Using features derived using PeDecURe's first direction of variation, we train a highly accurate and generalizable predictive model, as evidenced by its robustness in testing samples with different underlying nuisance variable distributions. We compare PeDecURe to commonly used decomposition methods (principal component analysis (PCA) and partial least squares) as well as a confounder-adjusted variation of PCA. We find that features derived from PeDecURe offer greater accuracy and generalizability and lower correlations with nuisance variables compared with the other methods. While PeDecURe is primarily motivated by challenges that arise in the analysis of neuroimaging data, it is broadly applicable to data sets with highly correlated features, where novel methods to handle nuisance variables are warranted.

Keywords: Feature extraction; Image analysis; Multivariate data.

MeSH terms

  • Alzheimer Disease* / diagnostic imaging
  • Brain* / diagnostic imaging
  • Disease Progression
  • Female
  • Humans
  • Image Processing, Computer-Assisted
  • Least-Squares Analysis
  • Magnetic Resonance Imaging
  • Male
  • Neuroimaging