Ensemble Clustering Classification compete SVM and One-Class classifiers applied on plant microRNAs Data

J Integr Bioinform. 2016 Dec 22;13(5):304. doi: 10.2390/biecoll-jib-2016-304.

Abstract

The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that ECkNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.

MeSH terms

  • Base Sequence
  • Cluster Analysis
  • Databases, Genetic
  • MicroRNAs / genetics*
  • MicroRNAs / metabolism
  • Nucleotide Motifs / genetics
  • Plants / genetics*
  • Support Vector Machine*

Substances

  • MicroRNAs