Predicting the short-term success of human influenza virus variants with machine learning

Proc Biol Sci. 2020 Apr 8;287(1924):20200319. doi: 10.1098/rspb.2020.0319. Epub 2020 Apr 8.

Abstract

Seasonal influenza viruses are constantly changing and produce a different set of circulating strains each season. Small genetic changes can accumulate over time and result in antigenically different viruses; this may prevent the body's immune system from recognizing those viruses. Due to rapid mutations, in particular, in the haemagglutinin (HA) gene, seasonal influenza vaccines must be updated frequently. This requires choosing strains to include in the updates to maximize the vaccines' benefits, according to estimates of which strains will be circulating in upcoming seasons. This is a challenging prediction task. In this paper, we use longitudinally sampled phylogenetic trees based on HA sequences from human influenza viruses, together with counts of epitope site polymorphisms in HA, to predict which influenza virus strains are likely to be successful. We extract small groups of taxa (subtrees) and use a suite of features of these subtrees as key inputs to the machine learning tools. Using a range of training and testing strategies, including training on H3N2 and testing on H1N1, we find that successful prediction of future expansion of small subtrees is possible from these data, with accuracies of 0.71-0.85 and a classifier 'area under the curve' 0.75-0.9.

Keywords: influenza; machine learning; phylogenetics; prediction; tree shape statistics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Evolution, Molecular*
  • Humans
  • Influenza Vaccines
  • Influenza, Human / classification*
  • Influenza, Human / transmission
  • Machine Learning*
  • Phylogeny

Substances

  • Influenza Vaccines

Associated data

  • figshare/10.6084/m9.figshare.c.4911456