On the use of machine learning to identify topological rules in the packing of beta-strands

Protein Eng. 1994 Nov;7(11):1295-303. doi: 10.1093/protein/7.11.1295.

Abstract

The machine learning program GOLEM was applied to discover topological rules in the packing of beta-sheets in alpha/beta-domain proteins. Rules (constraints) were determined for four features of beta-sheet packing: (i) whether a beta-strand is at an edge; (ii) whether two consecutive beta-strands pack parallel or anti-parallel; (iii) whether two beta-strands pack adjacently; and (iv) the winding direction of two consecutive beta-strands. Rules were found with high predictive accuracy and coverage. The errors were generally associated with complications in domain folds, especially in one doubly would domains. Investigation of the rules revealed interesting patterns, some of which were known previously, others that are novel. Novel features include (i) the relationship between pairs of sequential strands is in general one of decreasing size; (ii) more sequential pairs of strands wind in the direction out than in; and (iii) it takes a larger alteration in hydrophobicity to change a strand from winding in the direction out than in. These patterns in the data may be the result of folding pathways in the domains. The rules found are of predictive value and could be used in the combinatorial prediction of protein structure, or as a general test of model structures, e.g. those produced by threading. We conclude that machine learning has a useful role in the analysis of protein structures.

Publication types

  • Comparative Study

MeSH terms

  • Amino Acid Sequence
  • Artificial Intelligence
  • Computer Simulation*
  • Databases, Factual
  • Models, Molecular*
  • Molecular Sequence Data
  • Protein Folding
  • Protein Structure, Tertiary*
  • Software*