reactIDR: evaluation of the statistical reproducibility of high-throughput structural analyses towards a robust RNA structure prediction

BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):130. doi: 10.1186/s12859-019-2645-4.

Abstract

Background: Recently, next-generation sequencing techniques have been applied for the detection of RNA secondary structures, which is referred to as high-throughput RNA structural (HTS) analyses, and many different protocols have been used to detect comprehensive RNA structures at single-nucleotide resolution. However, the existing computational analyses heavily depend on the experimental methodology to generate data, which results in difficulties associated with statistically sound comparisons or combining the results obtained using different HTS methods.

Results: Here, we introduced a statistical framework, reactIDR, which can be applied to the experimental data obtained using multiple HTS methodologies. Using this approach, nucleotides are classified into three structural categories, loop, stem/background, and unmapped. reactIDR uses the irreproducible discovery rate (IDR) with a hidden Markov model to discriminate between the true and spurious signals obtained in the replicated HTS experiments accurately, and it is able to incorporate an expectation-maximization algorithm and supervised learning for efficient parameter optimization. The results of our analyses of the real-life HTS data showed that reactIDR had the highest accuracy in the classification of ribosomal RNA stem/loop structures when using both individual and integrated HTS datasets, and its results corresponded the best to the three-dimensional structures.

Conclusions: We have developed a novel software, reactIDR, for the prediction of stem/loop regions from the HTS analysis datasets. For the rRNA structure analyses, reactIDR was shown to have robust accuracy across different datasets by using the reproducibility criterion, suggesting its potential for increasing the value of existing HTS datasets. reactIDR is publicly available at https://github.com/carushi/reactIDR .

Keywords: High-throughput structural analysis; RNA secondary structure; Reproducibility.

MeSH terms

  • Algorithms*
  • Area Under Curve
  • High-Throughput Nucleotide Sequencing / methods*
  • Machine Learning
  • Markov Chains
  • Nucleic Acid Conformation*
  • Nucleotides
  • RNA / chemistry*
  • RNA, Ribosomal / chemistry
  • RNA, Ribosomal / genetics
  • ROC Curve
  • Reproducibility of Results
  • Statistics as Topic*

Substances

  • Nucleotides
  • RNA, Ribosomal
  • RNA