DNAskew: statistical analysis of base compositional asymmetry and prediction of replication boundaries in the genome sequences

Acta Biochim Biophys Sin (Shanghai). 2004 Jan;36(1):16-20. doi: 10.1093/abbs/36.1.16.

Abstract

Sueoka and Lobry declared respectively that, in the absence of bias between the two DNA strands for mutation and selection, the base composition within each strand should be A=T and C=G (this state is called Parity Rule type 2, PR2). However, the genome sequences of many bacteria, vertebrates and viruses showed asymmetries in base composition and gene direction. To determine the relationship of base composition skews with replication orientation, gene function, codon usage biases and phylogenetic evolution, in this paper a program called DNAskew was developed for the statistical analysis of strand asymmetry and codon composition bias in the DNA sequence. In addition, the program can also be used to predict the replication boundaries of genome sequences. The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. DNAskew was written in Perl script language and implemented on the LINUX operating system. It works quickly with annotated or unannotated sequences in GBFF (GenBank flatfile) or fasta format. The source code is freely available for academic use at http://www.epizooty.com/pub/stat/DNAskew.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Codon / genetics*
  • DNA Replication / genetics*
  • DNA, Bacterial / genetics
  • Gene Expression Profiling / methods*
  • Genomics / methods
  • Models, Genetic*
  • Models, Statistical*
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid

Substances

  • Codon
  • DNA, Bacterial