Empirical comparison of analytical approaches for identifying molecular HIV-1 clusters

Sci Rep. 2020 Oct 29;10(1):18547. doi: 10.1038/s41598-020-75560-1.

Abstract

Public health interventions guided by clustering of HIV-1 molecular sequences may be impacted by choices of analytical approaches. We identified commonly-used clustering analytical approaches, applied them to 1886 HIV-1 Rhode Island sequences from 2004-2018, and compared concordance in identifying molecular HIV-1 clusters within and between approaches. We used strict (topological support ≥ 0.95; distance 0.015 substitutions/site) and relaxed (topological support 0.80-0.95; distance 0.030-0.045 substitutions/site) thresholds to reflect different epidemiological scenarios. We found that clustering differed by method and threshold and depended more on distance than topological support thresholds. Clustering concordance analyses demonstrated some differences across analytical approaches, with RAxML having the highest (91%) mean summary percent concordance when strict thresholds were applied, and three (RAxML-, FastTree regular bootstrap- and IQ-Tree regular bootstrap-based) analytical approaches having the highest (86%) mean summary percent concordance when relaxed thresholds were applied. We conclude that different analytical approaches can yield diverse HIV-1 clustering outcomes and may need to be differentially used in diverse public health scenarios. Recognizing the variability and limitations of commonly-used methods in cluster identification is important for guiding clustering-triggered interventions to disrupt new transmissions and end the HIV epidemic.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Cluster Analysis
  • HIV Infections / epidemiology*
  • HIV-1 / genetics*
  • Humans
  • Phylogeny