Do deep learning models make a difference in the identification of antimicrobial peptides?

César R García-Jacas; Sergio A Pinacho-Castellanos; Luis A García-González; Carlos A Brizuela

doi:10.1093/bib/bbac094

Do deep learning models make a difference in the identification of antimicrobial peptides?

Brief Bioinform. 2022 May 13;23(3):bbac094. doi: 10.1093/bib/bbac094.

Authors

César R García-Jacas¹, Sergio A Pinacho-Castellanos^{2

3}, Luis A García-González², Carlos A Brizuela²

Affiliations

¹ Cátedras CONACYT - Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México.
² Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México.
³ Centro de Investigación y Desarrollo de Tecnología Digital (CITEDI), Instituto Politécnico Nacional (IPN), 22435 Tijuana, Baja California, México.

PMID: 35380616
DOI: 10.1093/bib/bbac094

Abstract

In the last few decades, antimicrobial peptides (AMPs) have been explored as an alternative to classical antibiotics, which in turn motivated the development of machine learning models to predict antimicrobial activities in peptides. The first generation of these predictors was filled with what is now known as shallow learning-based models. These models require the computation and selection of molecular descriptors to characterize each peptide sequence and train the models. The second generation, known as deep learning-based models, which no longer requires the explicit computation and selection of those descriptors, started to be used in the prediction task of AMPs just four years ago. The superior performance claimed by deep models regarding shallow models has created a prevalent inertia to using deep learning to identify AMPs. However, methodological flaws and/or modeling biases in the building of deep models do not support such superiority. Here, we analyze the main pitfalls that led to establish biased conclusions on the leading performance of deep models. Also, we analyze whether deep models truly contribute to achieve better predictions than shallow models by performing fair studies on different state-of-the-art benchmarking datasets. The experiments reveal that deep models do not outperform shallow models in the classification of AMPs, and that both types of models codify similar chemical information since their predictions are highly similar. Thus, according to the currently available datasets, we conclude that the use of deep learning could not be the most suitable approach to develop models to identify AMPs, mainly because shallow models achieve comparable-to-superior performances and are simpler (Ockham's razor principle). Even so, we suggest the use of deep learning only when its capabilities lead to obtaining significantly better performance gains worth the additional computational cost.

Keywords: LogitBoost; antimicrobial peptides; deep learning; diversity measures; gated recurrent units; long short-term memory networks; random forest; recurrent neural networks; shallow learning.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Antimicrobial Peptides
Deep Learning*
Machine Learning
Peptides / chemistry

Substances

Antimicrobial Peptides
Peptides