CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database

Glycobiology. 2010 Dec;20(12):1574-84. doi: 10.1093/glycob/cwq106. Epub 2010 Aug 9.

Abstract

The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Alteromonadaceae / enzymology*
  • Alteromonadaceae / genetics
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics*
  • Bacterial Proteins / metabolism
  • Carbohydrates*
  • Clostridium thermocellum / enzymology*
  • Clostridium thermocellum / genetics
  • Databases, Protein*
  • Enzymes / chemistry
  • Enzymes / classification
  • Enzymes / genetics*
  • Fungal Proteins / chemistry
  • Fungal Proteins / genetics*
  • Fungal Proteins / metabolism
  • Genome, Bacterial / physiology
  • Genome, Fungal / physiology
  • Molecular Sequence Annotation
  • Neurospora crassa / enzymology*
  • Neurospora crassa / genetics

Substances

  • Bacterial Proteins
  • Carbohydrates
  • Enzymes
  • Fungal Proteins