Using Haplotype-Based Artificial Intelligence to Evaluate SARS-CoV-2 Novel Variants and Mutations

JAMA Netw Open. 2023 Feb 1;6(2):e230191. doi: 10.1001/jamanetworkopen.2023.0191.

Abstract

Importance: Earlier detection of emerging novel SARS-COV-2 variants is important for public health surveillance of potential viral threats and for earlier prevention research. Artificial intelligence may facilitate early detection of SARS-CoV2 emerging novel variants based on variant-specific mutation haplotypes and, in turn, be associated with enhanced implementation of risk-stratified public health prevention strategies.

Objective: To develop a haplotype-based artificial intelligence (HAI) model for identifying novel variants, including mixture variants (MVs) of known variants and new variants with novel mutations.

Design, setting, and participants: This cross-sectional study used serially observed viral genomic sequences globally (prior to March 14, 2022) to train and validate the HAI model and used it to identify variants arising from a prospective set of viruses from March 15 to May 18, 2022.

Main outcomes and measures: Viral sequences, collection dates, and locations were subjected to statistical learning analysis to estimate variant-specific core mutations and haplotype frequencies, which were then used to construct an HAI model to identify novel variants.

Results: Through training on more than 5 million viral sequences, an HAI model was built, and its identification performance was validated on an independent validation set of more than 5 million viruses. Its identification performance was assessed on a prospective set of 344 901 viruses. In addition to achieving an accuracy of 92.8% (95% CI within 0.1%), the HAI model identified 4 Omicron MVs (Omicron-Alpha, Omicron-Delta, Omicron-Epsilon, and Omicron-Zeta), 2 Delta MVs (Delta-Kappa and Delta-Zeta), and 1 Alpha-Epsilon MV, among which Omicron-Epsilon MVs were most frequent (609/657 MVs [92.7%]). Furthermore, the HAI model found that 1699 Omicron viruses had unidentifiable variants given that these variants acquired novel mutations. Lastly, 524 variant-unassigned and variant-unidentifiable viruses carried 16 novel mutations, 8 of which were increasing in prevalence percentages as of May 2022.

Conclusions and relevance: In this cross-sectional study, an HAI model found SARS-COV-2 viruses with MV or novel mutations in the global population, which may require closer examination and monitoring. These results suggest that HAI may complement phylogenic variant assignment, providing additional insights into emerging novel variants in the population.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Artificial Intelligence*
  • COVID-19*
  • Cross-Sectional Studies
  • Haplotypes
  • Humans
  • Mutation
  • Prospective Studies
  • RNA, Viral
  • SARS-CoV-2

Substances

  • RNA, Viral

Supplementary concepts

  • SARS-CoV-2 variants