Accuracy of microRNA discovery pipelines in non-model organisms using closely related species genomes

PLoS One. 2014 Jan 3;9(1):e84747. doi: 10.1371/journal.pone.0084747. eCollection 2014.

Abstract

Mapping small reads to genome reference is an essential and more common approach to identify microRNAs (miRNAs) in an organism. Using closely related species genomes as proxy references can facilitate miRNA expression studies in non-model species that their genomes are not available. However, the level of error this introduces is mostly unknown, as this is the result of evolutionary distance between the proxy reference and the species of interest. To evaluate the accuracy of miRNA discovery pipelines in non-model organisms, small RNA library data from a mosquito, Aedes aegypti, were mapped to three well annotated insect genomes as proxy references using miRanalyzer with two strict and loose mapping criteria. In addition, another web-based miRNA discovery pipeline (DSAP) was used as a control for program performance. Using miRanalyzer, more than 80% reduction was observed in the number of mapped reads using strict criterion when proxy genome references were used; however, only 20% reduction was recorded for mapped reads to other species known mature miRNA datasets. Except a few changes in ranking, mapping criteria did not make any significant differences in the profile of the most abundant miRNAs in A. aegypti when its original or a proxy genome was used as reference. However, more variation was observed in miRNA ranking profile when DSAP was used as analysing tool. Overall, the results also suggested that using a proxy reference did not change the most abundant miRNAs' differential expression profiles when infected or non-infected libraries were compared. However, usage of a proxy reference could provide about 67% of the original outcome from more extremely up- or down-regulated miRNA profiles. Although using closely related species genome incurred some losses in the number of miRNAs, the most abundant miRNAs along with their differential expression profile would be acceptable based on the sensitivity level of each project.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aedes / genetics
  • Animals
  • Chromosome Mapping
  • Databases, Genetic
  • Gene Expression Regulation
  • Genome*
  • High-Throughput Nucleotide Sequencing
  • MicroRNAs / genetics*

Substances

  • MicroRNAs

Grants and funding

This work was supported by an Australian Research Council (www.arc.gov.au) Discovery grant to S.A. (DP110102112) and a University of Queensland (www.uq.edu.au) PhD scholarship to K.E. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.