Machine Learning Models to Predict Protein-Protein Interaction Inhibitors

Molecules. 2022 Nov 17;27(22):7986. doi: 10.3390/molecules27227986.

Abstract

Protein-protein interaction (PPI) inhibitors have an increasing role in drug discovery. It is hypothesized that machine learning (ML) algorithms can classify or identify PPI inhibitors. This work describes the performance of different algorithms and molecular fingerprints used in chemoinformatics to develop a classification model to identify PPI inhibitors making the codes freely available to the community, particularly the medicinal chemistry research groups working with PPI inhibitors. We found that classification algorithms have different performances according to various features employed in the training process. Random forest (RF) models with the extended connectivity fingerprint radius 2 (ECFP4) had the best classification abilities compared to those models trained with ECFP6 o MACCS keys (166-bits). In general, logistic regression (LR) models had lower performance metrics than RF models, but ECFP4 was the representation most appropriate for LR. ECFP4 also generated models with high-performance metrics with support vector machines (SVM). We also constructed ensemble models based on the top-performing models. As part of this work and to help non-computational experts, we developed a pipeline code freely available.

Keywords: chemoinformatics; computer-aided drug design; drug discovery; machine learning; protein–protein interaction.

MeSH terms

  • Algorithms
  • Cheminformatics*
  • Logistic Models
  • Machine Learning*
  • Support Vector Machine