A large peptidome dataset improves HLA class I epitope prediction across most of the human population

Siranush Sarkizova; Susan Klaeger; Phuong M Le; Letitia W Li; Giacomo Oliveira; Hasmik Keshishian; Christina R Hartigan; Wandi Zhang; David A Braun; Keith L Ligon; Pavan Bachireddy; Ioannis K Zervantonakis; Jennifer M Rosenbluth; Tamara Ouspenskaia; Travis Law; Sune Justesen; Jonathan Stevens; William J Lane; Thomas Eisenhaure; Guang Lan Zhang; Karl R Clauser; Nir Hacohen; Steven A Carr; Catherine J Wu; Derin B Keskin

doi:10.1038/s41587-019-0322-9

A large peptidome dataset improves HLA class I epitope prediction across most of the human population

Nat Biotechnol. 2020 Feb;38(2):199-209. doi: 10.1038/s41587-019-0322-9. Epub 2019 Dec 16.

Authors

Siranush Sarkizova^#^{1

2}, Susan Klaeger^#², Phuong M Le³, Letitia W Li³, Giacomo Oliveira³, Hasmik Keshishian², Christina R Hartigan², Wandi Zhang³, David A Braun^{2

3

4

5}, Keith L Ligon^{2

4

6

7}, Pavan Bachireddy^{2

3

5}, Ioannis K Zervantonakis⁸, Jennifer M Rosenbluth⁸, Tamara Ouspenskaia², Travis Law², Sune Justesen⁹, Jonathan Stevens¹⁰, William J Lane^{4

10}, Thomas Eisenhaure², Guang Lan Zhang^{3

4

11}, Karl R Clauser², Nir Hacohen^{12

13

14}, Steven A Carr¹⁵, Catherine J Wu^{16

17

18

19}, Derin B Keskin^{20

21

22

23

24}

Affiliations

¹ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
² Broad Institute of MIT and Harvard, Cambridge, MA, USA.
³ Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
⁴ Harvard Medical School, Boston, MA, USA.
⁵ Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.
⁶ Center for Patient Derived Models, Dana-Farber Cancer Institute, Boston, MA, USA.
⁷ Division of Neuropathology, Brigham and Women's Hospital, Boston, MA, USA.
⁸ Department of Cell Biology, Harvard Medical School, Boston, MA, USA.
⁹ Immunitrack, Copenhagen, Denmark.
¹⁰ Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA.
¹¹ Department of Computer Science, Metropolitan College, Boston University, Boston, MA, USA.
¹² Broad Institute of MIT and Harvard, Cambridge, MA, USA. nhacohen@mgh.harvard.edu.
¹³ Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA. nhacohen@mgh.harvard.edu.
¹⁴ Center for Cancer Immunology, Massachusetts General Hospital, Boston, MA, USA. nhacohen@mgh.harvard.edu.
¹⁵ Broad Institute of MIT and Harvard, Cambridge, MA, USA. scarr@broadinstitute.org.
¹⁶ Broad Institute of MIT and Harvard, Cambridge, MA, USA. cwu@partners.org.
¹⁷ Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA. cwu@partners.org.
¹⁸ Harvard Medical School, Boston, MA, USA. cwu@partners.org.
¹⁹ Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA. cwu@partners.org.
²⁰ Broad Institute of MIT and Harvard, Cambridge, MA, USA. derin_keskin@dfci.harvard.edu.
²¹ Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA. derin_keskin@dfci.harvard.edu.
²² Harvard Medical School, Boston, MA, USA. derin_keskin@dfci.harvard.edu.
²³ Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA. derin_keskin@dfci.harvard.edu.
²⁴ Department of Computer Science, Metropolitan College, Boston University, Boston, MA, USA. derin_keskin@dfci.harvard.edu.

^# Contributed equally.

Abstract

Prediction of HLA epitopes is important for the development of cancer immunotherapies and vaccines. However, current prediction algorithms have limited predictive power, in part because they were not trained on high-quality epitope datasets covering a broad range of HLA alleles. To enable prediction of endogenous HLA class I-associated peptides across a large fraction of the human population, we used mass spectrometry to profile >185,000 peptides eluted from 95 HLA-A, -B, -C and -G mono-allelic cell lines. We identified canonical peptide motifs per HLA allele, unique and shared binding submotifs across alleles and distinct motifs associated with different peptide lengths. By integrating these data with transcript abundance and peptide processing, we developed HLAthena, providing allele-and-length-specific and pan-allele-pan-length prediction models for endogenous peptide presentation. These models predicted endogenous HLA class I-associated ligands with 1.5-fold improvement in positive predictive value compared with existing tools and correctly identified >75% of HLA-bound peptides that were observed experimentally in 11 patient-derived tumor cell lines.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Alleles
Amino Acid Motifs
Cell Line
Databases, Protein*
Epitopes / metabolism*
Genetic Loci
Histocompatibility Antigens Class I / metabolism*
Humans
Ligands
Peptide Hydrolases / metabolism
Peptides / chemistry
Peptides / metabolism*
Proteasome Endopeptidase Complex / metabolism
Proteome / metabolism*

Substances

Epitopes
Histocompatibility Antigens Class I
Ligands
Peptides
Proteome
Peptide Hydrolases
Proteasome Endopeptidase Complex

Abstract

Publication types

MeSH terms

Substances

Grants and funding