ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data

Biosystems. 2007 Sep-Oct;90(2):516-28. doi: 10.1016/j.biosystems.2006.12.003. Epub 2006 Dec 16.

Abstract

An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent problem. This paper proposes an evolutionary approach to designing an SVM-based classifier (named ESVM) by simultaneous optimization of automatic feature selection and parameter tuning using an intelligent genetic algorithm, combined with k-fold cross-validation regarded as an estimator of generalization ability. To illustrate and evaluate the efficiency of ESVM, a typical application to microarray classification using 11 multi-class datasets is adopted. By considering model uncertainty, a frequency-based technique by voting on multiple sets of potentially informative features is used to identify the most effective subset of genes. It is shown that ESVM can obtain a high accuracy of 96.88% with a small number 10.0 of selected genes using 10-fold cross-validation for the 11 datasets averagely. The merits of ESVM are three-fold: (1) automatic feature selection and parameter setting embedded into ESVM can advance prediction abilities, compared to traditional SVMs; (2) ESVM can serve not only as an accurate classifier but also as an adaptive feature extractor; (3) ESVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of ESVM for bioinformatics problems.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Automation
  • Biological Evolution*
  • Chromosomes / ultrastructure
  • Computational Biology / methods*
  • Computers
  • Evolution, Molecular*
  • Gene Expression Regulation*
  • Models, Genetic
  • Models, Statistical
  • Models, Theoretical
  • Oligonucleotide Array Sequence Analysis
  • Software
  • Systems Biology