Multi-task learning sparse group lasso: a method for quantifying antigenicity of influenza A(H1N1) virus using mutations and variations in glycosylation of Hemagglutinin

BMC Bioinformatics. 2020 May 11;21(1):182. doi: 10.1186/s12859-020-3527-5.

Abstract

Background: In addition to causing the pandemic influenza outbreaks of 1918 and 2009, subtype H1N1 influenza A viruses (IAVs) have caused seasonal epidemics since 1977. Antigenic property of influenza viruses are determined by both protein sequence and N-linked glycosylation of influenza glycoproteins, especially hemagglutinin (HA). The currently available computational methods are only considered features in protein sequence but not N-linked glycosylation.

Results: A multi-task learning sparse group least absolute shrinkage and selection operator (LASSO) (MTL-SGL) regression method was developed and applied to derive two types of predominant features including protein sequence and N-linked glycosylation in hemagglutinin (HA) affecting variations in serologic data for human and swine H1N1 IAVs. Results suggested that mutations and changes in N-linked glycosylation sites are associated with the rise of antigenic variants of H1N1 IAVs. Furthermore, the implicated mutations are predominantly located at five reported antibody-binding sites, and within or close to the HA receptor binding site. All of the three N-linked glycosylation sites (i.e. sequons NCSV at HA 54, NHTV at HA 125, and NLSK at HA 160) identified by MTL-SGL to determine antigenic changes were experimentally validated in the H1N1 antigenic variants using mass spectrometry analyses. Compared with conventional sparse learning methods, MTL-SGL achieved a lower prediction error and higher accuracy, indicating that grouped features and MTL in the MTL-SGL method are not only able to handle serologic data generated from multiple reagents, supplies, and protocols, but also perform better in genetic sequence-based antigenic quantification.

Conclusions: In summary, the results of this study suggest that mutations and variations in N-glycosylation in HA caused antigenic variations in H1N1 IAVs and that the sequence-based antigenicity predictive model will be useful in understanding antigenic evolution of IAVs.

Keywords: Antigenic drift; Group lasso; H1N1; Influenza virus; LASSO; MTL-SGL; Multi-task learning; N-linked glycosylation; Sparse learning; Vaccine strain selection.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Animals
  • Antigens, Viral / immunology*
  • Base Sequence
  • Genome, Viral
  • Glycosylation
  • Hemagglutinin Glycoproteins, Influenza Virus / chemistry
  • Hemagglutinin Glycoproteins, Influenza Virus / genetics*
  • Humans
  • Influenza A Virus, H1N1 Subtype / genetics*
  • Influenza A Virus, H1N1 Subtype / immunology*
  • Influenza A virus / immunology
  • Influenza, Human / virology
  • Mutation / genetics*
  • Polysaccharides / immunology
  • Reproducibility of Results
  • Swine

Substances

  • Antigens, Viral
  • Hemagglutinin Glycoproteins, Influenza Virus
  • Polysaccharides