PredAPP: Predicting Anti-Parasitic Peptides with Undersampling and Ensemble Approaches

Interdiscip Sci. 2022 Mar;14(1):258-268. doi: 10.1007/s12539-021-00484-x. Epub 2021 Oct 4.

Abstract

Anti-parasitic peptides (APPs) have been regarded as promising therapeutic candidate drugs against parasitic diseases. Due to the fact that the experimental techniques for identifying APPs are expensive and time-consuming, there is an urgent need to develop a computational approach to predict APPs on a large scale. In this study, we provided a computational method, termed PredAPP (Prediction of Anti-Parasitic Peptides) that could effectively identify APPs using an ensemble of well-performed machine learning (ML) classifiers. Firstly, to solve the class imbalance problem, a balanced training dataset was generated by the undersampling method. We found that the balanced dataset based on cluster centroid achieved the best performance. Then, nine groups of features and six ML algorithms were combined to generate 54 classifiers and the output of these classifiers formed 54 feature representations, and in each feature group, we selected the feature representation with best performance for classification. Finally, the selected feature representations were integrated using logistic regression algorithm to construct the prediction model PredAPP. On the independent dataset, PredAPP achieved accuracy and AUC of 0.880 and 0.922, respectively, compared to 0.739 and 0.873 of AMPfun, a state-of-the-art method to predict APPs. The web server of PredAPP is freely accessible at http://predapp.xialab.info and https://github.com/xialab-ahu/PredAPP .

Keywords: Anti-parasitic peptide; Feature representation learning; Logistic regression; Undersampling method.

MeSH terms

  • Algorithms
  • Computers
  • Logistic Models
  • Machine Learning*
  • Peptides*

Substances

  • Peptides