Block alignment: New representation and comparison method to study evolution of genomes

Genomics. 2019 Dec;111(6):1590-1603. doi: 10.1016/j.ygeno.2018.11.003. Epub 2018 Nov 14.

Abstract

Genomes are not random sequences because natural selection has injected information in biological sequences for billions of years. Inspired by this idea, we developed a simple method to compare genomes considering nucleotide counts in subsequences (blocks) instead of their exact sequences. We introduce the Block Alignment method for comparing two genomes and based on this comparison method, define a similarity score and a distance. The presented model ignores nucleotide order in the sequence. On the other hand, in this block comparison method, due to exclusion of point mutations and small size variations, there is no need for high coverage sequencing which is responsible for the high costs of data production and storage; moreover, the sequence comparisons could be performed with higher speed. Phylogenetic trees of two sets of bacterial genomes were constructed and the results were in full agreement with their already constructed phylogenetic trees. Furthermore, a weighted and directed similarity network of each set of bacterial genomes was inferred ab initio by this model. Remarkably, the communities of these networks are in agreement with the clades of the corresponding phylogenetic trees which means these similarity networks also contain phylogenetic information about the genomes. Moreover, the block comparison method was used to distinguish rob(15;21)c-associated iAMP21 and sporadic iAMP21 rearrangements in subgroups of chromosome 21 in acute lymphoblastic leukemia. Our results show a meaningful difference between the number of contigs that mapped to chromosomes 15 and 21 in these cases. Furthermore, the presented block alignment model can select the candidate blocks to perform more accurate analysis and it is capable to find conserved blocks on a set of genomes.

Keywords: Adaptable Block alignment; Alignment confirmation algorithm; Binary tree; Block alignment; Comparison; Phylogenetic network; Similarity network.

MeSH terms

  • Bacteria / genetics*
  • Evolution, Molecular*
  • Genome, Bacterial*
  • Genomics
  • Phylogeny*
  • Sequence Alignment*
  • Sequence Analysis, DNA
  • Software*