A novel alignment-free method for HIV-1 subtype classification

Infect Genet Evol. 2020 Jan:77:104080. doi: 10.1016/j.meegid.2019.104080. Epub 2019 Nov 1.

Abstract

HIV-1 is the most common and pathogenic strain of human immunodeficiency virus consisting of many subtypes. To study the difference among HIV-1 subtypes in infection, diagnosis and drug design, it is important to identify HIV-1 subtypes from clinical HIV-1 samples. In this work, we propose an effective numeric representation called Subsequence Natural Vector (SNV) to encode HIV-1 sequences. Using the representation, we introduce an improved linear discriminant analysis method to classify HIV-1 viruses correctly. SNV is based on distribution of nucleotides in HIV-1 viral sequences. It not only computes the number of nucleotides, but also describes the position and variance of nucleotides in viruses. To validate our alignment-free method, 6902 complete genomes and 11,668 pol gene sequences of HIV-1 subtypes were collected from the up-to-date Los Alamos HIV database. SNV outperforms the three popular methods, Kameris, Comet and REGA, with almost 100% Sensitivity and Specificity, also with much less time. Our subtyping algorithm especially works better for circulating recombinant forms (CRFs) consisting of a few sequences. Our approach is also powerful to separate unique recombinant forms (URFs) from other subtypes with 100% Sensitivity and Specificity. Moreover, phylogenetic trees based on SNV representation are constructed using full-length HIV-1 genomes and pol genes respectively, where viruses from the same subtype are clustered together correctly.

Keywords: Alignment-free; Classification; HIV-1; SNV.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Databases, Genetic
  • Discriminant Analysis
  • Evolution, Molecular
  • Genetic Variation
  • HIV Infections / virology*
  • HIV-1 / classification*
  • HIV-1 / genetics
  • HIV-1 / isolation & purification
  • Humans
  • Phylogeny
  • RNA, Viral / genetics
  • Sequence Analysis, RNA / methods*

Substances

  • RNA, Viral