Analyzing kernel matrices for the identification of differentially expressed genes

PLoS One. 2013 Dec 9;8(12):e81683. doi: 10.1371/journal.pone.0081683. eCollection 2013.

Abstract

One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''the separability of a sample'' which is estimated by performing [Formula: see text]-like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method's essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate.

MeSH terms

  • Algorithms*
  • Colonic Neoplasms / genetics*
  • Databases, Genetic
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Leukemia / genetics*
  • Male
  • Normal Distribution
  • Oligonucleotide Array Sequence Analysis
  • Prostatic Neoplasms / genetics*
  • Support Vector Machine*

Grants and funding

This work was supported by grants from Natural Science Foundation, Zhejiang Province, P.R. China (Project No. LQ13F030011) and National Science Foundation of P.R. China (Project No. 61133010). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.