MIA: Mutual Information Analyzer, a graphic user interface program that calculates entropy, vertical and horizontal mutual information of molecular sequence sets

BMC Bioinformatics. 2015 Dec 10:16:409. doi: 10.1186/s12859-015-0837-0.

Abstract

Background: Short and long range correlations in biological sequences are central in genomic studies of covariation. These correlations can be studied using mutual information because it measures the amount of information one random variable contains about the other. Here we present MIA (Mutual Information Analyzer) a user friendly graphic interface pipeline that calculates spectra of vertical entropy (VH), vertical mutual information (VMI) and horizontal mutual information (HMI), since currently there is no user friendly integrated platform that in a single package perform all these calculations. MIA also calculates Jensen-Shannon Divergence (JSD) between pair of different species spectra, herein called informational distances. Thus, the resulting distance matrices can be presented by distance histograms and informational dendrograms, giving support to discrimination of closely related species.

Results: In order to test MIA we analyzed sequences from Drosophila Adh locus, because the taxonomy and evolutionary patterns of different Drosophila species are well established and the gene Adh is extensively studied. The search retrieved 959 sequences of 291 species. From the total, 450 sequences of 17 species were selected. With this dataset MIA performed all tasks in less than three hours: gathering, storing and aligning fasta files; calculating VH, VMI and HMI spectra; and calculating JSD between pair of different species spectra. For each task MIA saved tables and graphics in the local disk, easily accessible for future analysis.

Conclusions: Our tests revealed that the "informational model free" spectra may represent species signatures. Since JSD applied to Horizontal Mutual Information spectra resulted in statistically significant distances between species, we could calculate respective hierarchical clusters, herein called Informational Dendrograms (ID). When compared to phylogenetic trees all Informational Dendrograms presented similar taxonomy and species clusterization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Arabidopsis / genetics*
  • Arabidopsis Proteins / genetics*
  • Computational Biology / methods*
  • Computer Graphics*
  • Drosophila / genetics*
  • Drosophila Proteins / genetics*
  • Entropy
  • Evolution, Molecular
  • Genome
  • Genomics
  • Molecular Sequence Data
  • Oligonucleotide Array Sequence Analysis
  • Phylogeny
  • Sequence Analysis, DNA / methods

Substances

  • Arabidopsis Proteins
  • Drosophila Proteins