Using a stacked ensemble learning framework to predict modulators of protein-protein interactions

Comput Biol Med. 2023 Jul:161:107032. doi: 10.1016/j.compbiomed.2023.107032. Epub 2023 May 16.

Abstract

Identifying small molecule protein-protein interaction modulators (PPIMs) is a highly promising and meaningful research direction for drug discovery, cancer treatment, and other fields. In this study, we developed a stacking ensemble computational framework, SELPPI, based on a genetic algorithm and tree-based machine learning method for effectively predicting new modulators targeting protein-protein interactions. More specifically, extremely randomized trees (ExtraTrees), adaptive boosting (AdaBoost), random forest (RF), cascade forest, light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost) were used as basic learners. Seven types of chemical descriptors were taken as the input characteristic parameters. Primary predictions were obtained with each basic learner-descriptor pair. Then, the 6 methods mentioned above were used as meta learners and trained on the primary prediction in turn. The most efficient method was utilized as the meta learner. Finally, the genetic algorithm was used to select the optimal primary prediction output as the input of the meta learner for secondary prediction to obtain the final result. We systematically evaluated our model on the pdCSM-PPI datasets. To our knowledge, our model outperformed all existing models, which demonstrates its great power.

Keywords: Bioinformation; Drug discovery; Machine learning (ML); Protein–protein interaction modulators (PPIMs).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Drug Discovery
  • Machine Learning*
  • Random Forest*