Efficient feature selection and classification for microarray data

PLoS One. 2018 Aug 20;13(8):e0202167. doi: 10.1371/journal.pone.0202167. eCollection 2018.

Abstract

Feature selection and classification are the main topics in microarray data analysis. Although many feature selection methods have been proposed and developed in this field, SVM-RFE (Support Vector Machine based on Recursive Feature Elimination) is proved as one of the best feature selection methods, which ranks the features (genes) by training support vector machine classification model and selects key genes combining with recursive feature elimination strategy. The principal drawback of SVM-RFE is the huge time consumption. To overcome this limitation, we introduce a more efficient implementation of linear support vector machines and improve the recursive feature elimination strategy and then combine them together to select informative genes. Besides, we propose a simple resampling method to preprocess the datasets, which makes the information distribution of different kinds of samples balanced and the classification results more credible. Moreover, the applicability of four common classifiers is also studied in this paper. Extensive experiments are conducted on six most frequently used microarray datasets in this field, and the results show that the proposed methods have not only reduced the time consumption greatly but also obtained comparable classification performance.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Data Analysis*
  • Databases, Genetic
  • Gene Expression Profiling / methods
  • Humans
  • Microarray Analysis* / methods
  • Support Vector Machine

Grants and funding

Funded by National Natural Science Foundation of China (No.61271383; URL:http://www.nsfc.gov.cn/); Postgraduate Scientific Research Innovation Ability Training Plan Funding Projects of Huaqiao University (No.1511314017; URL:http://grs.hqu.edu.cn/).