A fast gene selection method for multi-cancer classification using multiple support vector data description

J Biomed Inform. 2015 Feb:53:381-9. doi: 10.1016/j.jbi.2014.12.009. Epub 2014 Dec 27.

Abstract

For cancer classification problems based on gene expression, the data usually has only a few dozen sizes but has thousands to tens of thousands of genes which could contain a large number of irrelevant genes. A robust feature selection algorithm is required to remove irrelevant genes and choose the informative ones. Support vector data description (SVDD) has been applied to gene selection for many years. However, SVDD cannot address the problems with multiple classes since it only considers the target class. In addition, it is time-consuming when applying SVDD to gene selection. This paper proposes a novel fast feature selection method based on multiple SVDD and applies it to multi-class microarray data. A recursive feature elimination (RFE) scheme is introduced to iteratively remove irrelevant features, so the proposed method is called multiple SVDD-RFE (MSVDD-RFE). To make full use of all classes for a given task, MSVDD-RFE independently selects a relevant gene subset for each class. The final selected gene subset is the union of these relevant gene subsets. The effectiveness and accuracy of MSVDD-RFE are validated by experiments on five publicly available microarray datasets. Our proposed method is faster and more effective than other methods.

Keywords: Gene expression data; Gene selection; Multi-class classification; Support vector data description; Support vector machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Bayes Theorem
  • Colonic Neoplasms / diagnosis
  • Colonic Neoplasms / genetics
  • Diagnosis, Computer-Assisted / methods
  • Gene Expression
  • Gene Expression Profiling
  • Gene Expression Regulation, Leukemic
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Leukemia / diagnosis
  • Leukemia / genetics
  • Models, Statistical
  • Neoplasms / diagnosis*
  • Neoplasms / genetics
  • Oligonucleotide Array Sequence Analysis
  • Pattern Recognition, Automated / methods*
  • Software
  • Support Vector Machine*