Deep convolutional networks for quality assessment of protein folds

Georgy Derevyanko; Sergei Grudinin; Yoshua Bengio; Guillaume Lamoureux

doi:10.1093/bioinformatics/bty494

Deep convolutional networks for quality assessment of protein folds

Bioinformatics. 2018 Dec 1;34(23):4046-4053. doi: 10.1093/bioinformatics/bty494.

Authors

Georgy Derevyanko¹, Sergei Grudinin², Yoshua Bengio³, Guillaume Lamoureux¹

Affiliations

¹ Department of Chemistry and Biochemistry and Centre for Research in Molecular Modeling (CERMM), Concordia University, Montréal, Québec, Canada.
² Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble, France.
³ Department of Computer Science and Operations Research, Université de Montréal, Montréal, Québec, Canada.

PMID: 29931128
DOI: 10.1093/bioinformatics/bty494

Abstract

Motivation: The computational prediction of a protein structure from its sequence generally relies on a method to assess the quality of protein models. Most assessment methods rank candidate models using heavily engineered structural features, defined as complex functions of the atomic coordinates. However, very few methods have attempted to learn these features directly from the data.

Results: We show that deep convolutional networks can be used to predict the ranking of model structures solely on the basis of their raw three-dimensional atomic densities, without any feature tuning. We develop a deep neural network that performs on par with state-of-the-art algorithms from the literature. The network is trained on decoys from the CASP7 to CASP10 datasets and its performance is tested on the CASP11 dataset. Additional testing on decoys from the CASP12, CAMEO and 3DRobot datasets confirms that the network performs consistently well across a variety of protein structures. While the network learns to assess structural decoys globally and does not rely on any predefined features, it can be analyzed to show that it implicitly identifies regions that deviate from the native structure.

Availability and implementation: The code and the datasets are available at https://github.com/lamoureux-lab/3DCNN_MQA.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Computational Biology*
Neural Networks, Computer*
Protein Folding*
Proteins / chemistry*

Substances

Proteins