Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks

Bioinformatics. 2017 Mar 1;33(5):685-692. doi: 10.1093/bioinformatics/btw678.

Abstract

Motivation: Capturing long-range interactions between structural but not sequence neighbors of proteins is a long-standing challenging problem in bioinformatics. Recently, long short-term memory (LSTM) networks have significantly improved the accuracy of speech and image classification problems by remembering useful past information in long sequential events. Here, we have implemented deep bidirectional LSTM recurrent neural networks in the problem of protein intrinsic disorder prediction.

Results: The new method, named SPOT-Disorder, has steadily improved over a similar method using a traditional, window-based neural network (SPINE-D) in all datasets tested without separate training on short and long disordered regions. Independent tests on four other datasets including the datasets from critical assessment of structure prediction (CASP) techniques and >10 000 annotated proteins from MobiDB, confirmed SPOT-Disorder as one of the best methods in disorder prediction. Moreover, initial studies indicate that the method is more accurate in predicting functional sites in disordered regions. These results highlight the usefulness combining LSTM with deep bidirectional recurrent neural networks in capturing non-local, long-range interactions for bioinformatics applications.

Availability and implementation: SPOT-disorder is available as a web server and as a standalone program at: http://sparks-lab.org/server/SPOT-disorder/index.php .

Contact: j.hanson@griffith.edu.au or yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au.

Supplementary information: Supplementary data is available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Caspases / chemistry
  • Caspases / metabolism
  • Computational Biology / methods*
  • Genetic Diseases, Inborn / metabolism
  • Machine Learning*
  • Memory, Short-Term
  • Neural Networks, Computer*
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / metabolism

Substances

  • Proteins
  • Caspases