ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data

PLoS One. 2017 May 25;12(5):e0178483. doi: 10.1371/journal.pone.0178483. eCollection 2017.

Abstract

Background: Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance.

Results: We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR.

Conclusion: ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER).

MeSH terms

  • Algorithms
  • Escherichia coli / genetics
  • High-Throughput Nucleotide Sequencing / methods
  • Nucleotides / genetics*
  • RNA / genetics*
  • Sequence Analysis, RNA / methods

Substances

  • Nucleotides
  • RNA

Grants and funding

This project is supported by the platform technology, National Center for Genetic Engineering and Biotechnology, Thailand (http://www.biotec.or.th) with grant number P-15-51103 and P-12-01270, funded to JP and PJS. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.