A universal model of RNA.DNA:DNA triplex formation accurately predicts genome-wide RNA-DNA interactions

Brief Bioinform. 2022 Nov 19;23(6):bbac445. doi: 10.1093/bib/bbac445.

Abstract

RNA.DNA:DNA triple helix (triplex) formation is a form of RNA-DNA interaction which regulates gene expression but is difficult to study experimentally in vivo. This makes accurate computational prediction of such interactions highly important in the field of RNA research. Current predictive methods use canonical Hoogsteen base pairing rules, which whilst biophysically valid, may not reflect the plastic nature of cell biology. Here, we present the first optimization approach to learn a probabilistic model describing RNA-DNA interactions directly from motifs derived from triplex sequencing data. We find that there are several stable interaction codes, including Hoogsteen base pairing and novel RNA-DNA base pairings, which agree with in vitro measurements. We implemented these findings in TriplexAligner, a program that uses the determined interaction codes to predict triplex binding. TriplexAligner predicts RNA-DNA interactions identified in all-to-all sequencing data more accurately than all previously published tools in human and mouse and also predicts previously studied triplex interactions with known regulatory functions. We further validated a novel triplex interaction using biophysical experiments. Our work is an important step towards better understanding of triplex formation and allows genome-wide analyses of RNA-DNA interactions.

Keywords: DNA; RNA; RNA–DNA interaction; Triplex; machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • DNA / genetics
  • DNA / metabolism
  • DNA Replication
  • Genome-Wide Association Study*
  • Humans
  • Mice
  • Nucleic Acid Conformation
  • RNA* / genetics

Substances

  • RNA
  • DNA