PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information

Bioinformatics. 2022 Jun 24;38(Suppl 1):i299-i306. doi: 10.1093/bioinformatics/btac259.

Abstract

Motivation: The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA-protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods.

Results: In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA-RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA-RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results.

Availability and implementation: The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA
  • Genomics* / methods
  • Humans
  • RNA
  • Regulatory Sequences, Nucleic Acid*
  • Sequence Analysis, DNA / methods

Substances

  • RNA
  • DNA