Detection of alpha-rod protein repeats using a neural network and application to huntingtin

PLoS Comput Biol. 2009 Mar;5(3):e1000304. doi: 10.1371/journal.pcbi.1000304. Epub 2009 Mar 13.

Abstract

A growing number of solved protein structures display an elongated structural domain, denoted here as alpha-rod, composed of stacked pairs of anti-parallel alpha-helices. Alpha-rods are flexible and expose a large surface, which makes them suitable for protein interaction. Although most likely originating by tandem duplication of a two-helix unit, their detection using sequence similarity between repeats is poor. Here, we show that alpha-rod repeats can be detected using a neural network. The network detects more repeats than are identified by domain databases using multiple profiles, with a low level of false positives (<10%). We identify alpha-rod repeats in approximately 0.4% of proteins in eukaryotic genomes. We then investigate the results for all human proteins, identifying alpha-rod repeats for the first time in six protein families, including proteins STAG1-3, SERAC1, and PSMD1-2 & 5. We also characterize a short version of these repeats in eight protein families of Archaeal, Bacterial, and Fungal species. Finally, we demonstrate the utility of these predictions in directing experimental work to demarcate three alpha-rods in huntingtin, a protein mutated in Huntington's disease. Using yeast two hybrid analysis and an immunoprecipitation technique, we show that the huntingtin fragments containing alpha-rods associate with each other. This is the first definition of domains in huntingtin and the first validation of predicted interactions between fragments of huntingtin, which sets up directions toward functional characterization of this protein. An implementation of the repeat detection algorithm is available as a Web server with a simple graphical output: http://www.ogic.ca/projects/ard. This can be further visualized using BiasViz, a graphic tool for representation of multiple sequence alignments.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Binding Sites
  • Computer Simulation
  • Huntingtin Protein
  • Models, Chemical*
  • Models, Molecular*
  • Molecular Sequence Data
  • Nerve Tissue Proteins / analysis*
  • Nerve Tissue Proteins / chemistry*
  • Neural Networks, Computer*
  • Nuclear Proteins / analysis*
  • Nuclear Proteins / chemistry*
  • Pattern Recognition, Automated / methods*
  • Protein Binding
  • Repetitive Sequences, Amino Acid
  • Sequence Analysis, Protein / methods*

Substances

  • HTT protein, human
  • Huntingtin Protein
  • Nerve Tissue Proteins
  • Nuclear Proteins