Artificial intelligence predicts the immunogenic landscape of SARS-CoV-2 leading to universal blueprints for vaccine designs

Sci Rep. 2020 Dec 23;10(1):22375. doi: 10.1038/s41598-020-78758-5.

Abstract

The global population is at present suffering from a pandemic of Coronavirus disease 2019 (COVID-19), caused by the novel coronavirus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). The goal of this study was to use artificial intelligence (AI) to predict blueprints for designing universal vaccines against SARS-CoV-2, that contain a sufficiently broad repertoire of T-cell epitopes capable of providing coverage and protection across the global population. To help achieve these aims, we profiled the entire SARS-CoV-2 proteome across the most frequent 100 HLA-A, HLA-B and HLA-DR alleles in the human population, using host-infected cell surface antigen presentation and immunogenicity predictors from the NEC Immune Profiler suite of tools, and generated comprehensive epitope maps. We then used these epitope maps as input for a Monte Carlo simulation designed to identify statistically significant "epitope hotspot" regions in the virus that are most likely to be immunogenic across a broad spectrum of HLA types. We then removed epitope hotspots that shared significant homology with proteins in the human proteome to reduce the chance of inducing off-target autoimmune responses. We also analyzed the antigen presentation and immunogenic landscape of all the nonsynonymous mutations across 3,400 different sequences of the virus, to identify a trend whereby SARS-COV-2 mutations are predicted to have reduced potential to be presented by host-infected cells, and consequently detected by the host immune system. A sequence conservation analysis then removed epitope hotspots that occurred in less-conserved regions of the viral proteome. Finally, we used a database of the HLA haplotypes of approximately 22,000 individuals to develop a "digital twin" type simulation to model how effective different combinations of hotspots would work in a diverse human population; the approach identified an optimal constellation of epitope hotspots that could provide maximum coverage in the global population. By combining the antigen presentation to the infected-host cell surface and immunogenicity predictions of the NEC Immune Profiler with a robust Monte Carlo and digital twin simulation, we have profiled the entire SARS-CoV-2 proteome and identified a subset of epitope hotspots that could be harnessed in a vaccine formulation to provide a broad coverage across the global population.

MeSH terms

  • Algorithms
  • Alleles
  • Amino Acid Sequence
  • COVID-19 / prevention & control*
  • COVID-19 / virology
  • COVID-19 Vaccines / immunology*
  • Drug Evaluation, Preclinical / methods
  • Epitopes, T-Lymphocyte / immunology
  • HLA Antigens / genetics
  • Haplotypes
  • Humans
  • Immunogenicity, Vaccine
  • Machine Learning*
  • Mutation
  • Pandemics / prevention & control*
  • Proteome*
  • Proteomics / methods
  • SARS-CoV-2 / chemistry*
  • SARS-CoV-2 / genetics
  • Software
  • Spike Glycoprotein, Coronavirus / immunology*

Substances

  • COVID-19 Vaccines
  • Epitopes, T-Lymphocyte
  • HLA Antigens
  • Proteome
  • Spike Glycoprotein, Coronavirus
  • spike protein, SARS-CoV-2