Biochemical Property Based Positional Matrix: A New Approach Towards Genome Sequence Comparison

J Mol Evol. 2023 Feb;91(1):93-131. doi: 10.1007/s00239-022-10082-0. Epub 2022 Dec 31.

Abstract

The growth of the genome sequence has become one of the emerging areas in the study of bioinformatics. It has led to an excessive demand for researchers to develop advanced methodologies for evolutionary relationships among species. The alignment-free methods have been proved to be more efficient and appropriate related to time and space than existing alignment-based methods for sequence analysis. In this study, a new alignment-free genome sequence comparison technique is proposed based on the biochemical properties of nucleotides. Each genome sequence can be distributed in four parameters to represent a 21-dimensional numerical descriptor using the Positional Matrix. To substantiate the proposed method, phylogenetic trees are constructed on the viral and mammalian datasets by applying the UPGMA/NJ clustering method. Further, the results of this method are compared with the results of the Feature Frequency Profiles method, the Positional Correlation Natural Vector method, the Graph-theoretic method, the Multiple Encoding Vector method, and the Fuzzy Integral Similarity method. In most cases, it is found that the present method produces more accurate results than the prior methods. Also, in the present method, the execution time for computation is comparatively small.

Keywords: Alignment-based method; Alignment-free method; Evolutionary relationship; Genome sequence comparison; Phylogenetic tree.

MeSH terms

  • Algorithms*
  • Animals
  • Computational Biology / methods
  • Genome* / genetics
  • Mammals / genetics
  • Nucleotides / genetics
  • Phylogeny
  • Sequence Analysis, DNA / methods

Substances

  • Nucleotides