An overview on nucleic-acid G-quadruplex prediction: from rule-based methods to deep neural networks

Brief Bioinform. 2023 Jul 20;24(4):bbad252. doi: 10.1093/bib/bbad252.

Abstract

Nucleic-acid G-quadruplexes (G4s) play vital roles in many cellular processes. Due to their importance, researchers have developed experimental assays to measure nucleic-acid G4s in high throughput. The generated high-throughput datasets gave rise to unique opportunities to develop machine-learning-based methods, and in particular deep neural networks, to predict G4s in any given nucleic-acid sequence and any species. In this paper, we review the success stories of deep-neural-network applications for G4 prediction. We first cover the experimental technologies that generated the most comprehensive nucleic-acid G4 high-throughput datasets in recent years. We then review classic rule-based methods for G4 prediction. We proceed by reviewing the major machine-learning and deep-neural-network applications to nucleic-acid G4 datasets and report a novel comparison between them. Next, we present the interpretability techniques used on the trained neural networks to learn key molecular principles underlying nucleic-acid G4 folding. As a new result, we calculate the overlap between measured DNA and RNA G4s and compare the performance of DNA- and RNA-G4 predictors on RNA- and DNA-G4 datasets, respectively, to demonstrate the potential of transfer learning from DNA G4s to RNA G4s. Last, we conclude with open questions in the field of nucleic-acid G4 prediction and computational modeling.

Keywords: G-quadruplex; deep learning; deep neural network; interpretability; transfer learning.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA / genetics
  • G-Quadruplexes*
  • Neural Networks, Computer
  • Nucleic Acids*
  • RNA / genetics

Substances

  • Nucleic Acids
  • DNA
  • RNA