Sequence-structure relationship study in all-α transmembrane proteins using an unsupervised learning approach

Amino Acids. 2015 Nov;47(11):2303-22. doi: 10.1007/s00726-015-2010-5. Epub 2015 Jun 5.

Abstract

Transmembrane proteins (TMPs) are major drug targets, but the knowledge of their precise topology structure remains highly limited compared with globular proteins. In spite of the difficulties in obtaining their structures, an important effort has been made these last years to increase their number from an experimental and computational point of view. In view of this emerging challenge, the development of computational methods to extract knowledge from these data is crucial for the better understanding of their functions and in improving the quality of structural models. Here, we revisit an efficient unsupervised learning procedure, called Hybrid Protein Model (HPM), which is applied to the analysis of transmembrane proteins belonging to the all-α structural class. HPM method is an original classification procedure that efficiently combines sequence and structure learning. The procedure was initially applied to the analysis of globular proteins. In the present case, HPM classifies a set of overlapping protein fragments, extracted from a non-redundant databank of TMP 3D structure. After fine-tuning of the learning parameters, the optimal classification results in 65 clusters. They represent at best similar relationships between sequence and local structure properties of TMPs. Interestingly, HPM distinguishes among the resulting clusters two helical regions with distinct hydrophobic patterns. This underlines the complexity of the topology of these proteins. The HPM classification enlightens unusual relationship between amino acids in TMP fragments, which can be useful to elaborate new amino acids substitution matrices. Finally, two challenging applications are described: the first one aims at annotating protein functions (channel or not), the second one intends to assess the quality of the structures (X-ray or models) via a new scoring function deduced from the HPM classification.

Keywords: Artificial neural network; Classification; Hybrid protein model; Learning approach; Protein structure; Sequence–structure relationship; Structural alphabet; Transmembrane protein.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence*
  • Animals
  • Crystallography, X-Ray
  • Databases, Protein
  • Humans
  • Membrane Proteins / chemistry*
  • Membrane Proteins / classification*
  • Models, Molecular*
  • Protein Structure, Tertiary
  • Structure-Activity Relationship

Substances

  • Membrane Proteins