From GC skews to wavelets: a gentle guide to the analysis of compositional asymmetries in genomic data

Biochimie. 2008 Apr;90(4):648-59. doi: 10.1016/j.biochi.2007.09.015. Epub 2007 Sep 29.

Abstract

Compositional asymmetries are pervasive in DNA sequences. They are the result of the asymmetric interactions between DNA and cellular mechanisms such as replication and transcription. Here, we review many of the methods that have been proposed over the years to analyse compositional asymmetries in DNA sequences. Among these we list GC skews, oligonucleotide skews and wavelets, which among other uses have been extensively employed to delimitate origins and termini of replication in genomes. We also review the use of multivariate methods, such as factorial correspondence analysis, discriminant analysis and analysis of variance, which allow assigning compositional strand asymmetries to the different biological processes shaping sequence composition. Finally, we review methods that have been used to infer substitution matrices and allow understanding the mutational processes underlying strand asymmetry. We focus on replication asymmetries because they have been more thoroughly studied, but the methods may be adapted, and often are, to other problems. Although strand asymmetry has been studied more frequently through compositional skews of nucleotides or oligonucleotides, we recall that, depending on the goal of the analysis, other methods may be more appropriate to answer certain biological questions. We also refer to programs freely available to analyse strand asymmetry.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Base Composition*
  • Base Pairing
  • Base Sequence*
  • Genome*
  • Multivariate Analysis
  • Sequence Analysis, DNA
  • Software