Poxvirus orthologous clusters: toward defining the minimum essential poxvirus genome

J Virol. 2003 Jul;77(13):7590-600. doi: 10.1128/jvi.77.13.7590-7600.2003.

Abstract

Increasingly complex bioinformatic analysis is necessitated by the plethora of sequence information currently available. A total of 21 poxvirus genomes have now been completely sequenced and annotated, and many more genomes will be available in the next few years. First, we describe the creation of a database of continuously corrected and updated genome sequences and an easy-to-use and extremely powerful suite of software tools for the analysis of genomes, genes, and proteins. These tools are available free to all researchers and, in most cases, alleviate the need for using multiple Internet sites for analysis. Further, we describe the use of these programs to identify conserved families of genes (poxvirus orthologous clusters) and have named the software suite POCs, which is available at www.poxvirus.org. Using POCs, we have identified a set of 49 absolutely conserved gene families-those which are conserved between the highly diverged families of insect-infecting entomopoxviruses and vertebrate-infecting chordopoxviruses. An additional set of 41 gene families conserved in chordopoxviruses was also identified. Thus, 90 genes are completely conserved in chordopoxviruses and comprise the minimum essential genome, and these will make excellent drug, antibody, vaccine, and detection targets. Finally, we describe the use of these tools to identify necessary annotation and sequencing updates in poxvirus genomes. For example, using POCs, we identified 19 genes that were widely conserved in poxviruses but missing from the vaccinia virus strain Tian Tan 1998 GenBank file. We have reannotated and resequenced fragments of this genome and verified that these genes are conserved in Tian Tan. The results for poxvirus genes and genomes are discussed in light of evolutionary processes.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Database Management Systems
  • Genome, Viral*
  • Molecular Sequence Data
  • Multigene Family*
  • Poxviridae / genetics*
  • Sequence Homology, Amino Acid
  • Software