A proteome-wide analysis of domain architectures of prokaryotic single-spanning transmembrane proteins

Comput Biol Chem. 2005 Oct;29(5):379-87. doi: 10.1016/j.compbiolchem.2005.08.004. Epub 2005 Oct 6.

Abstract

We performed a proteome-wide survey of the domain architectures in single-spanning transmembrane (TM) proteins (single-spannings) from 87 sequenced prokaryotic (Bacterial and Archaean) genomes by assigning Pfam domains to their N-tail and C-tail loops. Out of 14,625 single-spannings, 3,516 sequences have at least one domain assigned, and no domains were assigned to 7,850, with the remaining 3,259 with less reliable assignment. In the domain-assigned sequences, 3116 sequences are with at most two domains, and the other 400 sequences with more than two. The assigned domains distribute over 651 Pfam families, which account for 11.4% of the total Pfam-A families. Among the 651 families are mostly soluble-protein-originated ones, but only 21 families are unique to TM proteins. The occurrence frequency of the individual domain families follows a power-law, that is, 264 families occur only once, 106 just twice, and the families appeared more than 30 times are counted by only 39. It is found that the great majority of the sequences having one or two domains are of the type II topology with the C-tail loop containing domains on it. On the contrary, the N-tail loop of the same type topology seldom carries domains. Importantly, the assigned domains are always found on the tail loops longer than 60 residues, even for the small domains with less than 30 residues. There are still as many as 5,800 sequences without assigned domains in spite of having at least one long tail, on which no less than 1,000 novel domain families are expected most likely to lie concealed unknown yet. We also investigated the domain arrangement preference and the domain family combination patterns in 'singlets' (single-spannings with one assigned domain) and 'doublets' (with two domains).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Databases, Genetic
  • Databases, Protein
  • Genome, Archaeal
  • Genome, Bacterial
  • Membrane Proteins / chemistry*
  • Prokaryotic Cells / chemistry
  • Protein Structure, Tertiary*
  • Proteome / analysis
  • Proteome / chemistry*

Substances

  • Membrane Proteins
  • Proteome