Using CATH-Gene3D to Analyze the Sequence, Structure, and Function of Proteins

Curr Protoc Bioinformatics. 2015 Jun 19:50:1.28.1-1.28.21. doi: 10.1002/0471250953.bi0128s50.

Abstract

The CATH database is a classification of protein structures found in the Protein Data Bank (PDB). Protein structures are chopped into individual units of structural domains, and these domains are grouped together into superfamilies if there is sufficient evidence that they have diverged from a common ancestor during the process of evolution. A sister resource, Gene3D, extends this information by scanning sequence profiles of these CATH domain superfamilies against many millions of known proteins to identify related sequences. Thus the combined CATH-Gene3D resource provides confident predictions of the likely structural fold, domain organisation, and evolutionary relatives of these proteins. In addition, this resource incorporates annotations from a large number of external databases such as known enzyme active sites, GO molecular functions, physical interactions, and mutations. This unit details how to access and understand the information contained within the CATH-Gene3D Web pages, the downloadable data files, and the remotely accessible Web services.

Keywords: functional family; protein classification; protein domain; protein structure; superfamily.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Databases, Protein*
  • Molecular Sequence Data
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Search Engine

Substances

  • Proteins