Feature selection in the reconstruction of complex network representations of spectral data

PLoS One. 2013 Aug 26;8(8):e72045. doi: 10.1371/journal.pone.0072045. eCollection 2013.

Abstract

Complex networks have been extensively used in the last decade to characterize and analyze complex systems, and they have been recently proposed as a novel instrument for the analysis of spectra extracted from biological samples. Yet, the high number of measurements composing spectra, and the consequent high computational cost, make a direct network analysis unfeasible. We here present a comparative analysis of three customary feature selection algorithms, including the binning of spectral data and the use of information theory metrics. Such algorithms are compared by assessing the score obtained in a classification task, where healthy subjects and people suffering from different types of cancers should be discriminated. Results indicate that a feature selection strategy based on Mutual Information outperforms the more classical data binning, while allowing a reduction of the dimensionality of the data set in two orders of magnitude.

MeSH terms

  • Algorithms
  • Humans
  • Mass Spectrometry / methods*
  • Neoplasms / diagnosis
  • Neoplasms / metabolism*
  • Neural Networks, Computer*
  • Pattern Recognition, Automated / methods
  • Proteome / analysis*
  • Proteome / classification
  • ROC Curve
  • Reproducibility of Results

Substances

  • Proteome

Grants and funding

No current external funding sources for this study.