Bi-level multi-source learning for heterogeneous block-wise missing data

Neuroimage. 2014 Nov 15:102 Pt 1:192-206. doi: 10.1016/j.neuroimage.2013.08.015. Epub 2013 Aug 27.

Abstract

Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified "bi-level" learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches.

Keywords: Alzheimer's disease; Block-wise missing data; Multi-modal fusion; Multi-source; Optimization.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Algorithms
  • Alzheimer Disease / cerebrospinal fluid
  • Alzheimer Disease / diagnosis*
  • Data Mining*
  • Humans
  • Magnetic Resonance Imaging
  • Neuroimaging / statistics & numerical data*
  • Positron-Emission Tomography
  • Proteomics