Predicting and understanding the stability of G-quadruplexes

Bioinformatics. 2009 Jun 15;25(12):i374-82. doi: 10.1093/bioinformatics/btp210.

Abstract

Motivation: G-quadruplexes are stable four-stranded guanine-rich structures that can form in DNA and RNA. They are an important component of human telomeres and play a role in the regulation of transcription and translation. The biological significance of a G-quadruplex is crucially linked with its thermodynamic stability. Hence the prediction of G-quadruplex stability is of vital interest.

Results: In this article, we present a novel Bayesian prediction framework based on Gaussian process regression to determine the thermodynamic stability of previously unmeasured G-quadruplexes from the sequence information alone. We benchmark our approach on a large G-quadruplex dataset and compare our method to alternative approaches. Furthermore, we propose an active learning procedure which can be used to iteratively acquire data in an optimal fashion. Lastly, we demonstrate the usefulness of our procedure on a genome-wide study of quadruplexes in the human genome.

Availability: A data table with the training sequences is available as supplementary material. Source code is available online at http://www.inference.phy.cam.ac.uk/os252/projects/quadruplexes.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Bayes Theorem
  • Computational Biology / methods*
  • DNA / chemistry
  • Databases, Genetic
  • G-Quadruplexes*
  • Genome, Human
  • Humans
  • RNA / chemistry
  • Telomere / chemistry

Substances

  • RNA
  • DNA