Bags of words models of epitope sets: HIV viral load regression with counting grids

Pac Symp Biocomput. 2014:288-99.

Abstract

The immune system gathers evidence of the execution of various molecular processes, both foreign and the cells' own, as time- and space-varying sets of epitopes, small linear or conformational segments of the proteins involved in these processes. Epitopes do not have any obvious ordering in this scheme: The immune system simply sees these epitope sets as disordered "bags" of simple signatures based on whose contents the actions need to be decided. The immense landscape of possible bags of epitopes is shaped by the cellular pathways in various cells, as well as the characteristics of the internal sampling process that chooses and brings epitopes to cellular surface. As a consequence, upon the infection by the same pathogen, different individuals' cells present very different epitope sets. Modeling this landscape should thus be a key step in computational immunology. We show that among possible bag-of-words models, the counting grid is most fit for modeling cellular presentation. We describe each patient by a bag-of-peptides they are likely to present on the cellular surface. In regression tests, we found that compared to the state-of-the-art, counting grids explain more than twice as much of the log viral load variance in these patients. This is potentially a significant advancement in the field, given that a large part of the log viral load variance also depends on the infecting HIV strain, and that HIV polymorphisms themselves are known to strongly associate with HLA types, both effects beyond what is modeled here.

MeSH terms

  • Computational Biology
  • Epitopes / genetics
  • HIV / genetics*
  • HIV / immunology*
  • HIV Antigens / genetics
  • HIV Infections / immunology
  • HIV Infections / virology
  • HLA Antigens / genetics
  • HLA Antigens / metabolism
  • Histocompatibility Testing
  • Host-Pathogen Interactions / genetics
  • Host-Pathogen Interactions / immunology
  • Humans
  • Models, Immunological*
  • Precision Medicine
  • Regression Analysis
  • Viral Load / statistics & numerical data*

Substances

  • Epitopes
  • HIV Antigens
  • HLA Antigens