Classification of MALDI-MS imaging data of tissue microarrays using canonical correlation analysis-based variable selection

Proteomics. 2016 Jun;16(11-12):1731-5. doi: 10.1002/pmic.201500451. Epub 2016 May 9.

Abstract

Applying MALDI-MS imaging to tissue microarrays (TMAs) provides access to proteomics data from large cohorts of patients in a cost- and time-efficient way, and opens the potential for applying this technology in clinical diagnosis. The complexity of these TMA data-high-dimensional low sample size-provides challenges for the statistical analysis, as classical methods typically require a nonsingular covariance matrix that cannot be satisfied if the dimension is greater than the sample size. We use TMAs to collect data from endometrial primary carcinomas from 43 patients. Each patient has a lymph node metastasis (LNM) status of positive or negative, which we predict on the basis of the MALDI-MS imaging TMA data. We propose a variable selection approach based on canonical correlation analysis that explicitly uses the LNM information. We apply LDA to the selected variables only. Our method misclassifies 2.3-20.9% of patients by leave-one-out cross-validation and strongly outperforms LDA after reduction of the original data with principle component analysis.

Keywords: Bioinformatics; Canonical correlation analysis; Classification; Endometrial cancer; MALDI-MS imaging; Variable ranking.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Endometrial Neoplasms / diagnosis
  • Endometrial Neoplasms / diagnostic imaging*
  • Endometrial Neoplasms / pathology
  • Female
  • Humans
  • Lymphatic Metastasis
  • Neoplasm Staging
  • Principal Component Analysis
  • Proteomics / methods*
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / methods*
  • Tissue Array Analysis / methods*