DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

PLoS Comput Biol. 2019 Jun 14;15(6):e1007129. doi: 10.1371/journal.pcbi.1007129. eCollection 2019 Jun.

Abstract

Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors have been shown to not be sufficiently informative to predict accurate DTIs. Thus, in this study, we propose a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs. When we employ a convolutional neural network (CNN) on raw protein sequences, we perform convolution on various lengths of amino acids subsequences to capture local residue patterns of generalized protein classes. We train our model with large-scale DTI information and demonstrate the performance of the proposed model using an independent dataset that is not seen during the training phase. As a result, our model performs better than previous protein descriptor-based models. Also, our model performs better than the recently developed deep learning models for massive prediction of DTIs. By examining pooled convolution results, we confirmed that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches. Our code is available at https://github.com/GIST-CSBL/DeepConv-DTI.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Binding Sites
  • Computational Biology
  • Computer Simulation
  • Deep Learning*
  • Drug Discovery / methods*
  • Ligands
  • Models, Molecular
  • Proteins* / chemistry
  • Proteins* / metabolism
  • Sequence Analysis, Protein / methods*

Substances

  • Ligands
  • Proteins

Grants and funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (NRF-2018M3A9A7053266), the Bio-Synergy Research Project (NRF-2017M3A9C4092978) of the Ministry of Science and ICT through the National Research Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.