Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network

Buzhong Zhang; Linqing Li; Qiang Lü

doi:10.3390/biom8020033

Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network

Biomolecules. 2018 May 25;8(2):33. doi: 10.3390/biom8020033.

Authors

Buzhong Zhang^{1

2}, Linqing Li³, Qiang Lü⁴

Affiliations

¹ School of Computer Science and Technology, Soochow University, Suzhou 215006, China. 20154027005@stu.suda.edu.cn.
² School of Computer and Information, Anqing Normal University, Anqing 246011, China. 20154027005@stu.suda.edu.cn.
³ School of Computer Science and Technology, Soochow University, Suzhou 215006, China. linqinglee@gmail.com.
⁴ School of Computer Science and Technology, Soochow University, Suzhou 215006, China. qiang@suda.edu.cn.

Abstract

Residue solvent accessibility is closely related to the spatial arrangement and packing of residues. Predicting the solvent accessibility of a protein is an important step to understand its structure and function. In this work, we present a deep learning method to predict residue solvent accessibility, which is based on a stacked deep bidirectional recurrent neural network applied to sequence profiles. To capture more long-range sequence information, a merging operator was proposed when bidirectional information from hidden nodes was merged for outputs. Three types of merging operators were used in our improved model, with a long short-term memory network performing as a hidden computing node. The trained database was constructed from 7361 proteins extracted from the PISCES server using a cut-off of 25% sequence identity. Sequence-derived features including position-specific scoring matrix, physical properties, physicochemical characteristics, conservation score and protein coding were used to represent a residue. Using this method, predictive values of continuous relative solvent-accessible area were obtained, and then, these values were transformed into binary states with predefined thresholds. Our experimental results showed that our deep learning method improved prediction quality relative to current methods, with mean absolute error and Pearson's correlation coefficient values of 8.8% and 74.8%, respectively, on the CB502 dataset and 8.2% and 78%, respectively, on the Manesh215 dataset.

Keywords: bidirectional recurrent network; merging operator; sequence profile; solvent-accessibility prediction.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Deep Learning*
Humans
Hydrophobic and Hydrophilic Interactions*
Protein Conformation
Sequence Analysis, Protein / methods*
Solubility