Accurate detection of RNA stem-loops in structurome data reveals widespread association with protein binding sites

RNA Biol. 2021 Oct 15;18(sup1):521-536. doi: 10.1080/15476286.2021.1971382. Epub 2021 Oct 4.

Abstract

RNA molecules are known to fold into specific structures which often play a central role in their functions and regulation. In silico folding of RNA transcripts, especially when assisted with structure profiling (SP) data, is capable of accurately elucidating relevant structural conformations. However, such methods scale poorly to the swaths of SP data generated by transcriptome-wide experiments, which are becoming more commonplace and advancing our understanding of RNA structure and its regulation at global and local levels. This has created a need for tools capable of rapidly deriving structural assessments from SP data in a scalable manner. One such tool we previously introduced that aims to process such data is patteRNA, a statistical learning algorithm capable of rapidly mining big SP datasets for structural elements. Here, we present a reformulation of patteRNA's pattern recognition scheme that sees significantly improved precision without major compromises to computational overhead. Specifically, we developed a data-driven logistic classifier which interprets patteRNA's statistical characterizations of SP data in addition to local sequence properties as measured with a nearest neighbour thermodynamic model. Application of the classifier to human structurome data reveals a marked association between detected stem-loops and RNA binding protein (RBP) footprints. The results of our application demonstrate that upwards of 30% of RBP footprints occur within loops of stable stem-loop elements. Overall, our work arrives at a rapid and accurate method for automatically detecting families of RNA structure motifs and demonstrates the functional relevance of identifying them transcriptome-wide.

Keywords: RNA binding proteins; RNA structure; machine learning; statistical models; transcriptome.

MeSH terms

  • Algorithms*
  • Binding Sites
  • Computational Biology / methods*
  • Hep G2 Cells
  • Humans
  • K562 Cells
  • Nucleic Acid Conformation*
  • Nucleotide Motifs*
  • Protein Binding
  • RNA / chemistry*
  • RNA / genetics
  • RNA / metabolism*
  • RNA-Binding Proteins / metabolism*
  • Sequence Analysis, RNA
  • Transcriptome

Substances

  • RNA-Binding Proteins
  • RNA