PD-BertEDL: An Ensemble Deep Learning Method Using BERT and Multivariate Representation to Predict Peptide Detectability

Huiqing Wang; Juan Wang; Zhipeng Feng; Ying Li; Hong Zhao

doi:10.3390/ijms232012385

PD-BertEDL: An Ensemble Deep Learning Method Using BERT and Multivariate Representation to Predict Peptide Detectability

Int J Mol Sci. 2022 Oct 16;23(20):12385. doi: 10.3390/ijms232012385.

Authors

Huiqing Wang¹, Juan Wang¹, Zhipeng Feng¹, Ying Li¹, Hong Zhao¹

Affiliation

¹ College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China.

Abstract

Peptide detectability is defined as the probability of identifying a peptide from a mixture of standard samples, which is a key step in protein identification and analysis. Exploring effective methods for predicting peptide detectability is helpful for disease treatment and clinical research. However, most existing computational methods for predicting peptide detectability rely on a single information. With the increasing complexity of feature representation, it is necessary to explore the influence of multivariate information on peptide detectability. Thus, we propose an ensemble deep learning method, PD-BertEDL. Bidirectional encoder representations from transformers (BERT) is introduced to capture the context information of peptides. Context information, sequence information, and physicochemical information of peptides were combined to construct the multivariate feature space of peptides. We use different deep learning methods to capture the high-quality features of different categories of peptides information and use the average fusion strategy to integrate three model prediction results to solve the heterogeneity problem and to enhance the robustness and adaptability of the model. The experimental results show that PD-BertEDL is superior to the existing prediction methods, which can effectively predict peptide detectability and provide strong support for protein identification and quantitative analysis, as well as disease treatment.

Keywords: BERT; ensemble deep learning; multivariate representation; peptide detectability.

MeSH terms

Deep Learning*
Peptides / metabolism
Proteins / analysis

Substances

Peptides
Proteins

Grants and funding

20210302123092/the Youth Project of Shanxi Province