DNAskew: statistical analysis of base compositional asymmetry and prediction of replication boundaries in the genome sequences

Xiang-Ru Ma; Shao-Bo Xiao; Ai-Zhen Guo; Jian-Qing Lv; Huan-Chun Chen

doi:10.1093/abbs/36.1.16

DNAskew: statistical analysis of base compositional asymmetry and prediction of replication boundaries in the genome sequences

Acta Biochim Biophys Sin (Shanghai). 2004 Jan;36(1):16-20. doi: 10.1093/abbs/36.1.16.

Authors

Xiang-Ru Ma¹, Shao-Bo Xiao, Ai-Zhen Guo, Jian-Qing Lv, Huan-Chun Chen

Affiliation

¹ Laboratory of Animal Virology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan 430070, China. hzauvet@public.wh.hb.cn

PMID: 14732869
DOI: 10.1093/abbs/36.1.16

Abstract

Sueoka and Lobry declared respectively that, in the absence of bias between the two DNA strands for mutation and selection, the base composition within each strand should be A=T and C=G (this state is called Parity Rule type 2, PR2). However, the genome sequences of many bacteria, vertebrates and viruses showed asymmetries in base composition and gene direction. To determine the relationship of base composition skews with replication orientation, gene function, codon usage biases and phylogenetic evolution, in this paper a program called DNAskew was developed for the statistical analysis of strand asymmetry and codon composition bias in the DNA sequence. In addition, the program can also be used to predict the replication boundaries of genome sequences. The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. DNAskew was written in Perl script language and implemented on the LINUX operating system. It works quickly with annotated or unannotated sequences in GBFF (GenBank flatfile) or fasta format. The source code is freely available for academic use at http://www.epizooty.com/pub/stat/DNAskew.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Algorithms*
Codon / genetics*
DNA Replication / genetics*
DNA, Bacterial / genetics
Gene Expression Profiling / methods*
Genomics / methods
Models, Genetic*
Models, Statistical*
Sequence Alignment / methods*
Sequence Analysis, DNA / methods*
Sequence Homology, Nucleic Acid

Substances

Codon
DNA, Bacterial