Base-By-Base: single nucleotide-level analysis of whole viral genome alignments

BMC Bioinformatics. 2004 Jul 14:5:96. doi: 10.1186/1471-2105-5-96.

Abstract

Background: With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes) is not feasible without new bioinformatics tools.

Results: A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1) rapidly identify and correct alignment errors in large, multiple genome alignments; and 2) generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs) to retrieve detailed annotation information about the aligned genomes or use information from text files.

Conclusion: Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Base Composition / genetics
  • Computational Biology / statistics & numerical data
  • Computer Graphics
  • Computer Systems
  • DNA, Viral / genetics
  • Databases, Genetic
  • Fuzzy Logic
  • Genome, Viral
  • Mutation / genetics
  • Nucleotides / genetics*
  • Programming Languages
  • Regulatory Sequences, Nucleic Acid / genetics
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods
  • Software
  • Variola virus / genetics

Substances

  • DNA, Viral
  • Nucleotides