microRPM: a microRNA prediction model based only on plant small RNA sequencing data

Bioinformatics. 2018 Apr 1;34(7):1108-1115. doi: 10.1093/bioinformatics/btx725.

Abstract

Motivation: MicroRNAs (miRNAs) are endogenous non-coding small RNAs (of about 22 nucleotides), which play an important role in the post-transcriptional regulation of gene expression via either mRNA cleavage or translation inhibition. Several machine learning-based approaches have been developed to identify novel miRNAs from next generation sequencing (NGS) data. Typically, precursor/genomic sequences are required as references for most methods. However, the non-availability of genomic sequences is often a limitation in miRNA discovery in non-model plants. A systematic approach to determine novel miRNAs without reference sequences is thus necessary.

Results: In this study, an effective method was developed to identify miRNAs from non-model plants based only on NGS datasets. The miRNA prediction model was trained with several duplex structure-related features of mature miRNAs and their passenger strands using a support vector machine algorithm. The accuracy of the independent test reached 96.61% and 93.04% for dicots (Arabidopsis) and monocots (rice), respectively. Furthermore, true small RNA sequencing data from orchids was tested in this study. Twenty-one predicted orchid miRNAs were selected and experimentally validated. Significantly, 18 of them were confirmed in the qRT-PCR experiment. This novel approach was also compiled as a user-friendly program called microRPM (miRNA Prediction Model).

Availability and implementation: This resource is freely available at http://microRPM.itps.ncku.edu.tw.

Contact: nslin@sinica.edu.tw or sarah321@mail.ncku.edu.tw.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Genome, Plant*
  • High-Throughput Nucleotide Sequencing / methods*
  • MicroRNAs*
  • Plants / genetics
  • Plants / metabolism
  • RNA, Plant
  • Sequence Analysis, RNA / methods*
  • Support Vector Machine*

Substances

  • MicroRNAs
  • RNA, Plant