Microbial genome analysis: the COG approach

Brief Bioinform. 2019 Jul 19;20(4):1063-1070. doi: 10.1093/bib/bbx117.

Abstract

For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis.

Keywords: comparative genomics; enzyme evolution; genome annotation; orthologs; paralogs.

Publication types

  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Computational Biology
  • Databases, Protein
  • Evolution, Molecular
  • Genome, Microbial*
  • Genomics / methods*
  • Genomics / statistics & numerical data
  • Molecular Sequence Annotation
  • Multigene Family
  • Phylogeny
  • Proteins / classification
  • Proteins / genetics
  • Proteins / metabolism

Substances

  • Proteins