TRIPBASE: a database for identifying the human genomic DNA and lncRNA triplexes

NAR Genom Bioinform. 2023 May 22;5(2):lqad043. doi: 10.1093/nargab/lqad043. eCollection 2023 Jun.

Abstract

Long-non-coding RNAs (lncRNAs) are defined as RNA sequences which are >200 nt with no coding capacity. These lncRNAs participate in various biological mechanisms, and are widely abundant in a diversity of species. There is well-documented evidence that lncRNAs can interact with genomic DNAs by forming triple helices (triplexes). Previously, several computational methods have been designed based on the Hoogsteen base-pair rule to find theoretical RNA-DNA:DNA triplexes. While powerful, these methods suffer from a high false-positive rate between the predicted triplexes and the biological experiments. To address this issue, we first collected the experimental data of genomic RNA-DNA triplexes from antisense oligonucleotide (ASO)-mediated capture assays and used Triplexator, the most widely used tool for lncRNA-DNA interaction, to reveal the intrinsic information on true triplex binding potential. Based on the analysis, we proposed six computational attributes as filters to improve the in-silico triplex prediction by removing most false positives. Further, we have built a new database, TRIPBASE, as the first comprehensive collection of genome-wide triplex predictions of human lncRNAs. In TRIPBASE, the user interface allows scientists to apply customized filtering criteria to access the potential triplexes of human lncRNAs in the cis-regulatory regions of the human genome. TRIPBASE can be accessed at https://tripbase.iis.sinica.edu.tw/.