MHC2AffyPred: A machine-learning approach to estimate affinity of MHC class II peptides based on structural interaction fingerprints

Proteins. 2023 Feb;91(2):277-289. doi: 10.1002/prot.26428. Epub 2022 Oct 3.

Abstract

Understanding how MHC class II (MHC-II) binding peptides with differing lengths exhibit specific interaction at the core and extended sites within the large MHC-II pocket is a very important aspect of immunological research for designing peptides. Certain efforts were made to generate peptide conformations amenable for MHC-II binding and calculate the binding energy of such complex formation but not directed toward developing a relationship between the peptide conformation in MHC-II structures and the binding affinity (BA) (IC50 ). We present here a machine-learning approach to calculate the BA of the peptides within the MHC-II pocket for HLA-DRA1, HLA-DRB1, HLA-DP, and HLA-DQ allotypes. Instead of generating ensembles of peptide conformations conventionally, the biased mode of conformations was created by considering the peptides in the crystal structures of pMHC-II complexes as the templates, followed by site-directed peptide docking. The structural interaction fingerprints generated from such docked pMHC-II structures along with the Moran autocorrelation descriptors were trained using a random forest regressor specific to each MHC-II peptide lengths (9-19). The entire workflow is automated using Linux shell and Perl scripts to promote the utilization of MHC2AffyPred program to any characterized MHC-II allotypes and is made for free access at https://github.com/SiddhiJani/MHC2AffyPred. The MHC2AffyPred attained better performance (correlation coefficient [CC] of .612-.898) than MHCII3D (.03-.594) and NetMHCIIpan-3.2 (.289-.692) programs in the HLA-DRA1, HLA-DRB1 types. Similarly, the MHC2AffyPred program achieved CC between .91 and .98 for HLA-DP and HLA-DQ peptides (13-mer to 17-mer). Further, a case study on MHC-II binding 15-mer peptides of severe acute respiratory syndrome coronavirus-2 showed very close competency in computing the IC50 values compared to the sequence-based NetMHCIIpan v3.2 and v4.0 programs with a correlation of .998 and .570, respectively.

Keywords: MHC class II; MHC-II-peptide; SARS-CoV-2; immunoinformatics; machine-learning; random forest; structural interaction fingerprints.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19*
  • HLA-DP Antigens / chemistry
  • HLA-DP Antigens / metabolism
  • HLA-DQ Antigens / chemistry
  • HLA-DQ Antigens / metabolism
  • HLA-DRB1 Chains / metabolism
  • Humans
  • Machine Learning
  • Peptides / chemistry
  • Protein Binding

Substances

  • HLA-DRB1 Chains
  • Peptides
  • HLA-DP Antigens
  • HLA-DQ Antigens