Identification of new claudin family members by a novel PSI-BLAST based approach with enhanced specificity

Proteins. 2006 Dec 1;65(4):808-15. doi: 10.1002/prot.21218.

Abstract

In an attempt to develop a novel strategy for the identification of new members of protein families by in silico approaches, we have developed a semi-automated procedure of consecutive PSI-BLAST (Position-Specific-Iterated Basic Local Alignment Search Tool) searches incorporating identificiation as well as subsequent validation of putative candidates. For a proof of concept study we chose the search for novel members of the claudin family. The initial step was an iterated PSI-BLAST search starting with the PMP22_Claudin domain of each known member of the claudin family against the human part of the RefSeq Database. Putative new claudin domains derived from the converged list were evaluated by a validating PSI-BLAST in which each sequence was assessed for finding back the starting set of known claudin domains. The local PSI-BLAST searches and validation were automated by a set of PERL scripts. With this strategy a total of three additional putative claudin domains in three different proteins were identified. One of them was subjected to further characterization and was shown to exhibit claudin-like features in terms of protein structure and expression pattern. The strategy we present is an efficient and versatile tool to identify novel members of domain-sharing protein families. Low rates of false positives achieved by inclusion of a validation step into the in silico procedure make this strategy particularly attractive to select candidates for subsequent labor-intensive wet bench characterization.

MeSH terms

  • Amino Acid Sequence
  • Computational Biology*
  • Databases, Protein
  • Humans
  • Membrane Proteins / chemistry*
  • Molecular Sequence Data
  • Phylogeny
  • Protein Structure, Tertiary
  • Reverse Transcriptase Polymerase Chain Reaction
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein*

Substances

  • Membrane Proteins