Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S6. doi: 10.1186/1471-2105-8-S4-S6.

Abstract

Background: Existing methods for whole-genome comparisons require prior knowledge of related species and provide little automation in the function prediction process. Bacteriophage genomes are an example that cannot be easily analyzed by these methods. This work addresses these shortcomings and aims to provide an automated prediction system of gene function.

Results: We have developed a novel system called SynFPS to perform gene function prediction over completed genomes. The prediction system is initialized by clustering a large collection of weakly related genomes into groups based on their resemblance in gene distribution. From each individual group, data are then extracted and used to train a Support Vector Machine that makes gene function predictions. Experiments were conducted with 9 different gene functions over 296 bacteriophage genomes. Cross validation results gave an average prediction accuracy of ~80%, which is comparable to other genomic-context based prediction methods. Functional predictions are also made on 3 uncharacterized genes and 12 genes that cannot be identified by sequence alignment. The software is publicly available at http://www.synteny.net/.

Conclusion: The proposed system employs genomic context to predict gene function and detect gene correspondence in whole-genome comparisons. Although our experimental focus is on bacteriophages, the method may be extended to other microbial genomes as they share a number of similar characteristics with phage genomes such as gene order conservation.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Bacteriophages / genetics*
  • Base Sequence
  • Chromosome Mapping / methods*
  • Cluster Analysis*
  • Discriminant Analysis
  • Genome, Viral / genetics*
  • Molecular Sequence Data
  • Multigene Family / genetics*
  • Pattern Recognition, Automated / methods
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid