Getting to Know Your Neighbor: Protein Structure Prediction Comes of Age with Contextual Machine Learning

J Comput Biol. 2020 May;27(5):796-814. doi: 10.1089/cmb.2019.0193. Epub 2019 Aug 30.

Abstract

The folding of a protein structure is a process governed by both local and nonlocal interactions. While incorporating local dependencies into a machine learning algorithm for protein structure prediction is simple and has been exploited for some time, the modeling of long-range dependences which result from structurally-neighboring residues has only recently begun to be addressed. Structural properties designed to localize the prediction space from direct tertiary structure prediction, such as secondary structure, contact maps, and intrinsic disorder, among others, have begun to greatly benefit from machine learning models capable of modeling a widened, potentially global protein context. This has led to a direct enhancement of the quality of predicted tertiary structures through both the optimization of structural constraints and improved reliability of alignments to structural templates. These improvements have stemmed from the application of recurrent and convolutional neural network architectures effective not only at innate sequential context propagation but also deep feature extraction due to novel skip connections and normalization techniques allowing for greatly enhanced error back-propagation. The recent results from independent blind testing in Critical Assessment of protein Structure Prediction 13 have signaled the beginning of a new generation of protein structure prediction through the utilization of these contextual techniques. The ripples from advancements in the determination of one-dimensional and two-dimensional structural properties have us moving ever closer to the solution of the protein structure prediction problem.

Keywords: contextual learning; machine learning; neural networks; protein structure prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aging / genetics*
  • Aging / pathology
  • Algorithms
  • Machine Learning*
  • Neural Networks, Computer
  • Protein Conformation*
  • Proteins / genetics*
  • Proteins / ultrastructure

Substances

  • Proteins