PredAPP: Predicting Anti-Parasitic Peptides with Undersampling and Ensemble Approaches

Wei Zhang; Enhua Xia; Ruyu Dai; Wending Tang; Yannan Bin; Junfeng Xia

doi:10.1007/s12539-021-00484-x

PredAPP: Predicting Anti-Parasitic Peptides with Undersampling and Ensemble Approaches

Interdiscip Sci. 2022 Mar;14(1):258-268. doi: 10.1007/s12539-021-00484-x. Epub 2021 Oct 4.

Authors

Wei Zhang^{1

2}, Enhua Xia², Ruyu Dai¹, Wending Tang¹, Yannan Bin^{3

4}, Junfeng Xia^{5

6}

Affiliations

¹ Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.
² State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, Anhui, China.
³ Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China. ynbin@ahu.edu.cn.
⁴ Anhui Key Laboratory of Modern Biomanufacturing, Anhui University, Hefei, 230601, Anhui, China. ynbin@ahu.edu.cn.
⁵ Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China. jfxia@ahu.edu.cn.
⁶ State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, Anhui, China. jfxia@ahu.edu.cn.

PMID: 34608613
DOI: 10.1007/s12539-021-00484-x

Abstract

Anti-parasitic peptides (APPs) have been regarded as promising therapeutic candidate drugs against parasitic diseases. Due to the fact that the experimental techniques for identifying APPs are expensive and time-consuming, there is an urgent need to develop a computational approach to predict APPs on a large scale. In this study, we provided a computational method, termed PredAPP (Prediction of Anti-Parasitic Peptides) that could effectively identify APPs using an ensemble of well-performed machine learning (ML) classifiers. Firstly, to solve the class imbalance problem, a balanced training dataset was generated by the undersampling method. We found that the balanced dataset based on cluster centroid achieved the best performance. Then, nine groups of features and six ML algorithms were combined to generate 54 classifiers and the output of these classifiers formed 54 feature representations, and in each feature group, we selected the feature representation with best performance for classification. Finally, the selected feature representations were integrated using logistic regression algorithm to construct the prediction model PredAPP. On the independent dataset, PredAPP achieved accuracy and AUC of 0.880 and 0.922, respectively, compared to 0.739 and 0.873 of AMPfun, a state-of-the-art method to predict APPs. The web server of PredAPP is freely accessible at http://predapp.xialab.info and https://github.com/xialab-ahu/PredAPP .

Keywords: Anti-parasitic peptide; Feature representation learning; Logistic regression; Undersampling method.

MeSH terms

Algorithms
Computers
Logistic Models
Machine Learning*
Peptides*

Substances

Peptides

Abstract

MeSH terms

Substances

Grants and funding