Applying frequency chaos game representation with perceptual image hashing to gene sequence phylogenetic analyses

J Mol Graph Model. 2021 Sep:107:107942. doi: 10.1016/j.jmgm.2021.107942. Epub 2021 May 23.

Abstract

As a very important research direction in the field of bioinformatics, sequence alignment plays a vital role in the research and development of biology. Converting genome sequence to graph by using frequency chaos game representation (FCGR) is an excellent gene sequence mapping technology, which can store rich genetic information into FCGR graphics. To each FCGR image, we construct its perceptual image hashing (PIH) matrix using the bicubic interpolation zooming. The difference of the perceptual hash matrix of each two images is calculated, and the clustering distance of the corresponding two gene sequences is represented by the differentials of the perceptual hash matrix. In this paper, we aligned and analyzed several typical genome sequence datasets including mammalian mitochondrial genes, human immunodeficiency virus 1 (HIV-1) and hepatitis E virus (HEV) to build their evolutionary trees. Experimental results showed that our PIH combining FCGR method (FCGR-PIH) has similar classification accuracy to the classical Clustal W sequence alignment method. Furthermore, 25 complete mitochondrial DNA sequences of cichlid fishes and 27 Escherichia coli/Shigella full genome sequences were selected from the AFproject test platform for tests. The performance benchmark rankings demonstrate the effectiveness of the FCGR-PIH algorithm and its potential for large-scale genome sequence analysis.

Keywords: Frequency chaos game representation; Genome sequences alignment; Perceptual image hashing; Phylogenetic tree construction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Cluster Analysis
  • Computational Biology*
  • Humans
  • Phylogeny
  • Sequence Alignment