Learning the language of viral evolution and escape

Science. 2021 Jan 15;371(6526):284-288. doi: 10.1126/science.abd7331.

Abstract

The ability for viruses to mutate and evade the human immune system and cause infection, called viral escape, remains an obstacle to antiviral and vaccine development. Understanding the complex rules that govern escape could inform therapeutic design. We modeled viral escape with machine learning algorithms originally developed for human natural language. We identified escape mutations as those that preserve viral infectivity but cause a virus to look different to the immune system, akin to word changes that preserve a sentence's grammaticality but change its meaning. With this approach, language models of influenza hemagglutinin, HIV-1 envelope glycoprotein (HIV Env), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Spike viral proteins can accurately predict structural escape patterns using sequence data alone. Our study represents a promising conceptual bridge between natural language and viral evolution.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Acquired Immunodeficiency Syndrome / immunology*
  • Acquired Immunodeficiency Syndrome / virology
  • Binding Sites
  • COVID-19 / immunology*
  • COVID-19 / virology
  • Evolution, Molecular
  • HIV-1 / genetics*
  • Hemagglutinin Glycoproteins, Influenza Virus / chemistry
  • Hemagglutinin Glycoproteins, Influenza Virus / genetics
  • Humans
  • Influenza A virus / genetics*
  • Influenza, Human / immunology*
  • Influenza, Human / virology
  • Mutation
  • Protein Domains
  • SARS-CoV-2 / genetics*
  • Spike Glycoprotein, Coronavirus / chemistry
  • Spike Glycoprotein, Coronavirus / genetics
  • env Gene Products, Human Immunodeficiency Virus / chemistry
  • env Gene Products, Human Immunodeficiency Virus / genetics

Substances

  • Hemagglutinin Glycoproteins, Influenza Virus
  • Spike Glycoprotein, Coronavirus
  • env Gene Products, Human Immunodeficiency Virus
  • spike protein, SARS-CoV-2