Using structural and evolutionary information to detect and correct pyrosequencing errors in noncoding RNAs

J Comput Biol. 2013 Nov;20(11):905-19. doi: 10.1089/cmb.2013.0085. Epub 2013 Oct 17.

Abstract

The analysis of the sequence-structure relationship in RNA molecules is not only essential for evolutionary studies but also for concrete applications such as error-correction in next generation sequencing (NGS) technologies. The prohibitive sizes of the mutational and conformational landscapes, combined with the volume of data to process, require efficient algorithms to compute sequence-structure properties. In this article, we address the correction of NGS errors by calculating which mutations most increase the likelihood of a sequence to a given structure and RNA family. We introduce RNApyro, an efficient, linear time and space inside-outside algorithm that computes exact mutational probabilities under secondary structure and evolutionary constraints given as a multiple sequence alignment with a consensus structure. We develop a scoring scheme combining classical stacking base-pair energies to novel isostericity scores and apply our techniques to correct pointwise errors in 5s and 16s rRNA sequences. Our results suggest that RNApyro is a promising algorithm to complement existing tools in the NGS error-correction pipeline.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Simulation
  • Evolution, Molecular
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Models, Genetic
  • Nucleic Acid Conformation
  • RNA, Bacterial / genetics
  • RNA, Ribosomal, 16S / genetics*
  • RNA, Ribosomal, 5S / genetics*
  • RNA, Untranslated / genetics*
  • ROC Curve
  • Sequence Analysis, RNA*

Substances

  • RNA, Bacterial
  • RNA, Ribosomal, 16S
  • RNA, Ribosomal, 5S
  • RNA, Untranslated