PATO: genome-wide prediction of lncRNA-DNA triple helices

Bioinformatics. 2023 Mar 1;39(3):btad134. doi: 10.1093/bioinformatics/btad134.

Abstract

Motivation: Long non-coding RNA (lncRNA) plays a key role in many biological processes. For instance, lncRNA regulates chromatin using different molecular mechanisms, including direct RNA-DNA hybridization via triplexes, cotranscriptional RNA-RNA interactions, and RNA-DNA binding mediated by protein complexes. While the functional annotation of lncRNA transcripts has been widely studied over the last 20 years, barely a handful of tools have been developed with the specific purpose of detecting and evaluating lncRNA-DNA triple helices. What is worse, some of these tools have nearly grown a decade old, making new triplex-centric pipelines depend on legacy software that cannot thoroughly process all the data made available by next-generation sequencing (NGS) technologies.

Results: We present PATO, a modern, fast, and efficient tool for the detection of lncRNA-DNA triplexes that matches NGS processing capabilities. PATO enables the prediction of triple helices at the genome scale and can process in as little as 1 h more than 60 GB of sequence data using a two-socket server. Moreover, PATO's efficiency allows a more exhaustive search of the triplex-forming solution space, and so PATO achieves higher levels of prediction accuracy in far less time than other tools in the state of the art.

Availability and implementation: Source code, user manual, and tests are freely available to download under the MIT License at https://github.com/UDC-GAC/pato.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA / metabolism
  • RNA, Long Noncoding* / genetics
  • RNA, Long Noncoding* / metabolism
  • Software

Substances

  • RNA, Long Noncoding
  • DNA