Manatee: detection and quantification of small non-coding RNAs from next-generation sequencing data

Sci Rep. 2020 Jan 20;10(1):705. doi: 10.1038/s41598-020-57495-9.

Abstract

Small non-coding RNAs (sncRNAs) play important roles in health and disease. Next Generation Sequencing (NGS) technologies are considered as the most powerful and versatile methodologies to explore small RNA (sRNA) transcriptomes in diverse experimental and clinical studies. Small RNA-Seq (sRNA-Seq) data analysis proved to be challenging due to non-unique genomic origin, short length, and abundant post-transcriptional modifications of sRNA species. Here, we present Manatee, an algorithm for the quantification of sRNA classes and the detection of novel expressed non-coding loci. Manatee combines prior annotation of sRNAs with reliable alignment density information and extensive rescue of usually neglected multimapped reads to provide accurate transcriptome-wide sRNA expression quantification. Comparison of Manatee against state-of-the-art implementations using real and simulated data demonstrates its high accuracy across diverse sRNA classes. Manatee also goes beyond common pipelines by identifying and quantifying expression from unannotated loci and microRNA isoforms (isomiRs). It is user-friendly, can be easily incorporated in pipelines, and provides a simplified output suitable for direct usage in downstream analyses and functional studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Gene Expression Profiling
  • Hep G2 Cells
  • High-Throughput Nucleotide Sequencing
  • Humans
  • MCF-7 Cells
  • Molecular Sequence Annotation
  • Neoplasms / genetics*
  • RNA, Small Untranslated / classification
  • RNA, Small Untranslated / genetics*
  • Sequence Analysis, RNA / methods*

Substances

  • RNA, Small Untranslated