HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction

Brief Bioinform. 2022 Sep 20;23(5):bbac173. doi: 10.1093/bib/bbac173.

Abstract

Human Leukocyte Antigen (HLA) is a type of molecule residing on the surfaces of most human cells and exerts an essential role in the immune system responding to the invasive items. The T cell antigen receptors may recognize the HLA-peptide complexes on the surfaces of cancer cells and destroy these cancer cells through toxic T lymphocytes. The computational determination of HLA-binding peptides will facilitate the rapid development of cancer immunotherapies. This study hypothesized that the natural language processing-encoded peptide features may be further enriched by another deep neural network. The hypothesis was tested with the Bi-directional Long Short-Term Memory-extracted features from the pretrained Protein Bidirectional Encoder Representations from Transformers-encoded features of the class I HLA (HLA-I)-binding peptides. The experimental data showed that our proposed HLAB feature engineering algorithm outperformed the existing ones in detecting the HLA-I-binding peptides. The extensive evaluation data show that the proposed HLAB algorithm outperforms all the seven existing studies on predicting the peptides binding to the HLA-A*01:01 allele in AUC and achieves the best average AUC values on the six out of the seven k-mers (k=8,9,...,14, respectively represent the prediction task of a polypeptide consisting of k amino acids) except for the 9-mer prediction tasks. The source code and the fine-tuned feature extraction models are available at http://www.healthinformaticslab.org/supp/resources.php.

Keywords: BERT; BiLSTM; bioinformatics; class I HLA-binding peptide prediction; feature selection; natural language processing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / metabolism
  • HLA Antigens / chemistry
  • HLA Antigens / genetics
  • HLA-A Antigens / metabolism
  • Histocompatibility Antigens Class I* / chemistry
  • Humans
  • Peptides* / chemistry
  • Protein Binding

Substances

  • Amino Acids
  • HLA Antigens
  • HLA-A Antigens
  • Histocompatibility Antigens Class I
  • Peptides