Bioactive Peptide Recognition Based on NLP Pre-Train Algorithm

IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3809-3819. doi: 10.1109/TCBB.2023.3323295. Epub 2023 Dec 25.

Abstract

Bioactive peptides are defined as peptide sequences within a protein that can regulate important bodily functions through their myriad activities. With the development of machine learning, more computational methods were proposed for bioactive peptides recognition so that this task does not only rely on tedious and time-consuming wet-experiment. But the training and testing process of existing models are limited to small datasets, which affects model performance. Inspired by the success of sequence classification in natural language processing with unlabeled data, we proposed a pre-training method for Bioactive peptides recognition. By pre-trained with large-scale of protein sequences, our method achieved the best performance in multiple functional peptides identification including anti-cancer, anti-diabetic, anti-hypertensive, anti-inflammatory and anti-microbial peptides. Compared with the advanced model, our model's precision, coverage, accuracy and absolute true are improved by 7.2%, 6.9%, 6.1% and 4.2% in the result of 5-fold cross-validation. In addition, the results indicate the model has superior prediction performance in single functional peptides recognition, especially for anti-cancer peptides and anti-microbial peptides which with longer sequences.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Anti-Inflammatory Agents
  • Machine Learning
  • Natural Language Processing
  • Peptides*

Substances

  • Peptides
  • Anti-Inflammatory Agents