Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure

Genomics. 2019 Dec;111(6):1839-1852. doi: 10.1016/j.ygeno.2018.12.007. Epub 2018 Dec 11.

Abstract

The identification of drug-target interactions has great significance for pharmaceutical scientific research. Since traditional experimental methods identifying drug-target interactions is costly and time-consuming, the use of machine learning methods to predict potential drug-target interactions has attracted widespread attention. This paper presents a novel drug-target interactions prediction method called LRF-DTIs. Firstly, the pseudo-position specific scoring matrix (PsePSSM) and FP2 molecular fingerprinting were used to extract the features of drug-target. Secondly, using Lasso to reduce the dimension of the extracted feature information and then the Synthetic Minority Oversampling Technique (SMOTE) method was used to deal with unbalanced data. Finally, the processed feature vectors were input into a random forest (RF) classifier to predict drug-target interactions. Through 10 trials of 5-fold cross-validation, the overall prediction accuracies on the enzyme, ion channel (IC), G-protein-coupled receptor (GPCR) and nuclear receptor (NR) datasets reached 98.09%, 97.32%, 95.69%, and 94.88%, respectively, and compared with other prediction methods. In addition, we have tested and verified that our method not only could be applied to predict the new interactions but also could obtain a satisfactory result on the new dataset. All the experimental results indicate that our method can significantly improve the prediction accuracy of drug-target interactions and play a vital role in the new drug research and target protein development. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/LRF-DTIs/ for academic use.

Keywords: Drug-target interactions; Lasso; Molecular fingerprint; Pseudo-position specific scoring matrix; Random forest; SMOTE.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Databases, Protein*
  • Drug Development
  • Ion Channels / genetics*
  • Machine Learning*
  • Position-Specific Scoring Matrices
  • Protein Conformation
  • Receptors, Cytoplasmic and Nuclear / genetics*
  • Receptors, G-Protein-Coupled / genetics*
  • Software*

Substances

  • Ion Channels
  • Receptors, Cytoplasmic and Nuclear
  • Receptors, G-Protein-Coupled