Confronting the catalytic dark matter encoded by sequenced genomes

Nucleic Acids Res. 2017 Nov 16;45(20):11495-11514. doi: 10.1093/nar/gkx937.

Abstract

The post-genomic era has provided researchers with a deluge of protein sequences. However, a significant fraction of the proteins encoded by sequenced genomes remains without an identified function. Here, we aim at determining how many enzymes of uncertain or unknown function are still present in the Saccharomyces cerevisiae and human proteomes. Using information available in the Swiss-Prot, BRENDA and KEGG databases in combination with a Hidden Markov Model-based method, we estimate that >600 yeast and 2000 human proteins (>30% of their proteins of unknown function) are enzymes whose precise function(s) remain(s) to be determined. This illustrates the impressive scale of the 'unknown enzyme problem'. We extensively review classical biochemical as well as more recent systematic experimental and computational approaches that can be used to support enzyme function discovery research. Finally, we discuss the possible roles of the elusive catalysts in light of recent developments in the fields of enzymology and metabolism as well as the significance of the unknown enzyme problem in the context of metabolic modeling, metabolic engineering and rare disease research.

MeSH terms

  • Base Sequence
  • Biocatalysis*
  • Chromosome Mapping
  • Databases, Genetic
  • Databases, Protein
  • Enzymes / analysis
  • Enzymes / genetics
  • Genome, Fungal / genetics*
  • Genome, Human / genetics*
  • Humans
  • Metabolome / genetics*
  • Metabolomics / methods
  • Proteome / genetics
  • Quantitative Trait Loci
  • Saccharomyces cerevisiae / enzymology*
  • Saccharomyces cerevisiae / genetics

Substances

  • Enzymes
  • Proteome