Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification

Xiao Liang; Fuyi Li; Jinxiang Chen; Junlong Li; Hao Wu; Shuqin Li; Jiangning Song; Quanzhong Liu

doi:10.1093/bib/bbaa312

Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification

Brief Bioinform. 2021 Jul 20;22(4):bbaa312. doi: 10.1093/bib/bbaa312.

Authors

Xiao Liang^{1

2}, Fuyi Li^{3

4

5}, Jinxiang Chen¹, Junlong Li¹, Hao Wu¹, Shuqin Li^{1

2}, Jiangning Song^{3

4

6}, Quanzhong Liu^{1

2}

Affiliations

¹ College of Information Engineering, Northwest A&F University, Yangling, 712100, China.
² Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China.
³ Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.
⁴ Monash Centre for Data Science, Monash University, Melbourne, VIC 3800, Australia.
⁵ Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia.
⁶ ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia.

Abstract

Anti-cancer peptides (ACPs) are known as potential therapeutics for cancer. Due to their unique ability to target cancer cells without affecting healthy cells directly, they have been extensively studied. Many peptide-based drugs are currently evaluated in the preclinical and clinical trials. Accurate identification of ACPs has received considerable attention in recent years; as such, a number of machine learning-based methods for in silico identification of ACPs have been developed. These methods promote the research on the mechanism of ACPs therapeutics against cancer to some extent. There is a vast difference in these methods in terms of their training/testing datasets, machine learning algorithms, feature encoding schemes, feature selection methods and evaluation strategies used. Therefore, it is desirable to summarize the advantages and disadvantages of the existing methods, provide useful insights and suggestions for the development and improvement of novel computational tools to characterize and identify ACPs. With this in mind, we firstly comprehensively investigate 16 state-of-the-art predictors for ACPs in terms of their core algorithms, feature encoding schemes, performance evaluation metrics and webserver/software usability. Then, comprehensive performance assessment is conducted to evaluate the robustness and scalability of the existing predictors using a well-prepared benchmark dataset. We provide potential strategies for the model performance improvement. Moreover, we propose a novel ensemble learning framework, termed ACPredStackL, for the accurate identification of ACPs. ACPredStackL is developed based on the stacking ensemble strategy combined with SVM, Naïve Bayesian, lightGBM and KNN. Empirical benchmarking experiments against the state-of-the-art methods demonstrate that ACPredStackL achieves a comparative performance for predicting ACPs. The webserver and source code of ACPredStackL is freely available at http://bigdata.biocie.cn/ACPredStackL/ and https://github.com/liangxiaoq/ACPredStackL, respectively.

Keywords: anti-cancer peptides; bioinformatics; ensemble learning; performance assessment; prediction; sequence analysis.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Review

MeSH terms

Antineoplastic Agents* / chemistry
Antineoplastic Agents* / therapeutic use
Humans
Machine Learning*
Neoplasms* / drug therapy
Neoplasms* / genetics
Peptides / chemistry
Peptides / genetics
Peptides / therapeutic use
Software*

Substances

Antineoplastic Agents
Peptides

Grants and funding

R01 AI111965/AI/NIAID NIH HHS/United States