Positional Correlation Natural Vector: A Novel Method for Genome Comparison

Int J Mol Sci. 2020 May 29;21(11):3859. doi: 10.3390/ijms21113859.

Abstract

Advances in sequencing technology have made large amounts of biological data available. Evolutionary analysis of data such as DNA sequences is highly important in biological studies. As alignment methods are ineffective for analyzing large-scale data due to their inherently high costs, alignment-free methods have recently attracted attention in the field of bioinformatics. In this paper, we introduce a new positional correlation natural vector (PCNV) method that involves converting a DNA sequence into an 18-dimensional numerical feature vector. Using frequency and position correlation to represent the nucleotide distribution, it is possible to obtain a PCNV for a DNA sequence. This new numerical vector design uses six suitable features to characterize the correlation among nucleotide positions in sequences. PCNV is also very easy to compute and can be used for rapid genome comparison. To test our novel method, we performed phylogenetic analysis with several viral and bacterial genome datasets with PCNV. For comparison, an alignment-based method, Bayesian inference, and two alignment-free methods, feature frequency profile and natural vector, were performed using the same datasets. We found that the PCNV technique is fast and accurate when used for phylogenetic analysis and classification of viruses and bacteria.

Keywords: alignment-free; genome comparison; phylogenetic analysis; positional correlation natural vector.

MeSH terms

  • Algorithms
  • Genome, Bacterial
  • Genome, Viral
  • Phylogeny*
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid*