A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information

J Chromatogr B Analyt Technol Biomed Life Sci. 2012 Dec 1:910:149-55. doi: 10.1016/j.jchromb.2012.05.020. Epub 2012 May 24.

Abstract

Filtering the discriminative metabolites from high dimension metabolome data is very important in metabolomics study. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique and has shown promising applications in the analysis of the metabolome data. SVM-RFE measures the weights of the features according to the support vectors, noise and non-informative variables in the high dimension data may affect the hyper-plane of the SVM learning model. Hence we proposed a mutual information (MI)-SVM-RFE method which filters out noise and non-informative variables by means of artificial variables and MI, then conducts SVM-RFE to select the most discriminative features. A serum metabolomics data set from patients with chronic hepatitis B, cirrhosis and hepatocellular carcinoma analyzed by liquid chromatography-mass spectrometry (LC-MS) was used to demonstrate the validation of our method. An accuracy of 74.33±2.98% to distinguish among three liver diseases was obtained, better than 72.00±4.15% from the original SVM-RFE. Thirty-four ion features were defined to distinguish among the control and 3 liver diseases, 17 of them were identified.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Chromatography, High Pressure Liquid
  • Data Mining
  • Humans
  • Ions / blood*
  • Liver Diseases / blood*
  • Mass Spectrometry
  • Metabolomics
  • Support Vector Machine*

Substances

  • Ions