Symbolic data analysis to defy low signal-to-noise ratio in microarray data for breast cancer prognosis

J Comput Biol. 2013 Aug;20(8):610-20. doi: 10.1089/cmb.2012.0249.

Abstract

Microarray profiling has recently generated the hope to gain new insights into breast cancer biology and thereby improve the performance of current prognostic tools. However, it also poses several serious challenges to classical data analysis techniques related to the characteristics of resulting data, mainly high dimensionality and low signal-to-noise ratio. Despite the tremendous research work performed to handle the first challenge in the feature selection framework, very little attention has been directed to address the second one. We propose in this article to address both issues simultaneously based on symbolic data analysis capabilities in order to derive more accurate genetic marker-based prognostic models. In particular, interval data representation is employed to model various uncertainties in microarray measurements. A recent feature selection algorithm that handles symbolic interval data is used then to derive a genetic signature. The predictive value of the derived signature is then assessed by following a rigorous experimental setup and compared with existing prognostic approaches in terms of predictive performance and estimated survival probability. It is shown that the derived signature (GenSym) performs significantly better than other prognostic models, including the 70-gene signature, St. Gallen, and National Institutes of Health criteria.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Biomarkers, Tumor / analysis*
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / mortality
  • Breast Neoplasms / secondary
  • Computational Biology
  • Data Mining
  • Databases, Genetic
  • Female
  • Gene Expression Profiling*
  • Humans
  • Middle Aged
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis*
  • Pattern Recognition, Automated
  • Prognosis
  • Signal-To-Noise Ratio*
  • Survival Rate

Substances

  • Biomarkers, Tumor