A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information

Xiaohui Lin; Fufang Yang; Lina Zhou; Peiyuan Yin; Hongwei Kong; Wenbin Xing; Xin Lu; Lewen Jia; Quancai Wang; Guowang Xu

doi:10.1016/j.jchromb.2012.05.020

A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information

J Chromatogr B Analyt Technol Biomed Life Sci. 2012 Dec 1:910:149-55. doi: 10.1016/j.jchromb.2012.05.020. Epub 2012 May 24.

Authors

Xiaohui Lin¹, Fufang Yang, Lina Zhou, Peiyuan Yin, Hongwei Kong, Wenbin Xing, Xin Lu, Lewen Jia, Quancai Wang, Guowang Xu

Affiliation

¹ School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, China.

PMID: 22682888
DOI: 10.1016/j.jchromb.2012.05.020

Abstract

Filtering the discriminative metabolites from high dimension metabolome data is very important in metabolomics study. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique and has shown promising applications in the analysis of the metabolome data. SVM-RFE measures the weights of the features according to the support vectors, noise and non-informative variables in the high dimension data may affect the hyper-plane of the SVM learning model. Hence we proposed a mutual information (MI)-SVM-RFE method which filters out noise and non-informative variables by means of artificial variables and MI, then conducts SVM-RFE to select the most discriminative features. A serum metabolomics data set from patients with chronic hepatitis B, cirrhosis and hepatocellular carcinoma analyzed by liquid chromatography-mass spectrometry (LC-MS) was used to demonstrate the validation of our method. An accuracy of 74.33±2.98% to distinguish among three liver diseases was obtained, better than 72.00±4.15% from the original SVM-RFE. Thirty-four ion features were defined to distinguish among the control and 3 liver diseases, 17 of them were identified.

Publication types

Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Chromatography, High Pressure Liquid
Data Mining
Humans
Ions / blood*
Liver Diseases / blood*
Mass Spectrometry
Metabolomics
Support Vector Machine*

Substances

Ions