Computational Protein Design with Deep Learning Neural Networks

Jingxue Wang; Huali Cao; John Z H Zhang; Yifei Qi

doi:10.1038/s41598-018-24760-x

Computational Protein Design with Deep Learning Neural Networks

Sci Rep. 2018 Apr 20;8(1):6349. doi: 10.1038/s41598-018-24760-x.

Authors

Jingxue Wang¹, Huali Cao¹, John Z H Zhang^{1

2

3

4}, Yifei Qi^{5

6}

Affiliations

¹ Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China.
² NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China.
³ Department of Chemistry, New York University, NY, NY, 10003, USA.
⁴ Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi, 030006, China.
⁵ Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China. yfqi@chem.ecnu.edu.cn.
⁶ NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China. yfqi@chem.ecnu.edu.cn.

Abstract

Computational protein design has a wide variety of applications. Despite its remarkable success, designing a protein for a given structure and function is still a challenging task. On the other hand, the number of solved protein structures is rapidly increasing while the number of unique protein folds has reached a steady number, suggesting more structural information is being accumulated on each fold. Deep learning neural network is a powerful method to learn such big data set and has shown superior performance in many machine learning fields. In this study, we applied the deep learning neural network approach to computational protein design for predicting the probability of 20 natural amino acids on each residue in a protein. A large set of protein structures was collected and a multi-layer neural network was constructed. A number of structural properties were extracted as input features and the best network achieved an accuracy of 38.3%. Using the network output as residue type restraints improves the average sequence identity in designing three natural proteins using Rosetta. Moreover, the predictions from our network show ~3% higher sequence identity than a previous method. Results from this study may benefit further development of computational protein design methods.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Amino Acid Sequence
Amino Acids
Big Data
Computational Biology
Deep Learning
Machine Learning
Neural Networks, Computer
Probability
Protein Engineering / methods*
Proteins / metabolism

Substances

Amino Acids
Proteins