A graph-based algorithm for mining multi-level patterns in genomic data

J Bioinform Comput Biol. 2010 Oct;8(5):789-807. doi: 10.1142/s0219720010005002.

Abstract

Comparative genomics is concerned with the study of genome structure and function of different species. It can provide useful information for the derivation of evolutionary and functional relationships between genomes. Previous work on genome comparison focuses mainly on comparing the entire genomes for visualization without further analysis. As many interesting patterns may exist between genomes and may lead to the discovering of functional gene segments (groups of genes), we propose an algorithm called Multi-Level Genome Comparison Algorithm (MGC) that can be used to facilitate the analysis of genomes at multi-levels during the comparison process to discover sequential and regional consistency in gene segments. Different genomes may have common sub-sequences that differ from each other due to mutations, lateral gene transfers, gene rearrangements, etc., and these sub-sequences are usually not easily identified. Not all the genes can have a perfect one-to-one matching with each other. It is quite possible for one-to-many or many-to-many ambiguous relationships to exist between them. To perform the tasks effectively, MGC takes such ambiguity into consideration during genome comparison by representing genomes in a graph and then make use of a graph mining algorithm called the Multi-Level Attributed Graph Mining Algorithm (MAGMA) to build a hierarchical multi-level graph structure to facilitate genome comparison. To determine the effectiveness of these proposed algorithms, experiments were performed using intra- and inter-species of Microbial genomes. The results show that the proposed algorithms are able to discover multiple level matching patterns that show the similarities and dissimilarities among different genomes, in addition to confirming the specific role of the genes in the genomes.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms*
  • Animals
  • Chlamydia muridarum / classification
  • Chlamydia muridarum / genetics
  • Chlamydiales / classification
  • Chlamydiales / genetics
  • Chlamydophila pneumoniae / classification
  • Chlamydophila pneumoniae / genetics
  • Computational Biology
  • Data Mining / methods*
  • Genome, Bacterial
  • Genomics / statistics & numerical data*
  • Humans
  • Models, Genetic
  • Sequence Alignment / statistics & numerical data
  • Species Specificity