Computing and visually analyzing mutual information in molecular co-evolution

BMC Bioinformatics. 2010 Jun 17:11:330. doi: 10.1186/1471-2105-11-330.

Abstract

Background: Selective pressure in molecular evolution leads to uneven distributions of amino acids and nucleotides. In fact one observes correlations among such constituents due to a large number of biophysical mechanisms (folding properties, electrostatics, ...). To quantify these correlations the mutual information -after proper normalization--has proven most effective. The challenge is to navigate the large amount of data, which in a study for a typical protein cannot simply be plotted.

Results: To visually analyze mutual information we developed a matrix visualization tool that allows different views on the mutual information matrix: filtering, sorting, and weighting are among them. The user can interactively navigate a huge matrix in real-time and search e.g., for patterns and unusual high or low values. A computation of the mutual information matrix for a sequence alignment in FASTA-format is possible. The respective stand-alone program computes in addition proper normalizations for a null model of neutral evolution and maps the mutual information to Z-scores with respect to the null model.

Conclusions: The new tool allows to compute and visually analyze sequence data for possible co-evolutionary signals. The tool has already been successfully employed in evolutionary studies on HIV1 protease and acetylcholinesterase. The functionality of the tool was defined by users using the tool in real-world research. The software can also be used for visual analysis of other matrix-like data, such as information obtained by DNA microarray experiments. The package is platform-independently implemented in Java and free for academic use under a GPL license.

MeSH terms

  • Acetylcholinesterase / chemistry
  • Acetylcholinesterase / genetics
  • Animals
  • Base Sequence
  • Computational Biology / methods*
  • Evolution, Molecular*
  • HIV Protease / chemistry
  • HIV Protease / genetics
  • HIV-1 / enzymology
  • HIV-1 / genetics
  • Models, Molecular
  • Oligonucleotide Array Sequence Analysis
  • Programming Languages
  • Proteins / chemistry*
  • Proteins / genetics*
  • Sequence Alignment / methods
  • Software*
  • Torpedo / genetics
  • Torpedo / metabolism

Substances

  • Proteins
  • Acetylcholinesterase
  • HIV Protease
  • p16 protease, Human immunodeficiency virus 1