Genetic algorithm-based efficient feature selection for classification of pre-miRNAs

Genet Mol Res. 2011 Apr 12;10(2):588-603. doi: 10.4238/vol10-2gmr969.

Abstract

In order to classify the real/pseudo human precursor microRNA (pre-miRNAs) hairpins with ab initio methods, numerous features are extracted from the primary sequence and second structure of pre-miRNAs. However, they include some redundant and useless features. It is essential to select the most representative feature subset; this contributes to improving the classification accuracy. We propose a novel feature selection method based on a genetic algorithm, according to the characteristics of human pre-miRNAs. The information gain of a feature, the feature conservation relative to stem parts of pre-miRNA, and the redundancy among features are all considered. Feature conservation was introduced for the first time. Experimental results were validated by cross-validation using datasets composed of human real/pseudo pre-miRNAs. Compared with microPred, our classifier miPredGA, achieved more reliable sensitivity and specificity. The accuracy was improved nearly 12%. The feature selection algorithm is useful for constructing more efficient classifiers for identification of real human pre-miRNAs from pseudo hairpins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Computational Biology / methods
  • Humans
  • Inverted Repeat Sequences / genetics*
  • MicroRNAs* / chemistry
  • MicroRNAs* / genetics
  • MicroRNAs* / ultrastructure
  • Molecular Sequence Data
  • Nucleic Acid Conformation*
  • RNA Precursors / chemistry
  • RNA Precursors / genetics
  • Sequence Analysis, DNA

Substances

  • MicroRNAs
  • RNA Precursors