Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes

BMC Genomics. 2014 Jan 22:15:52. doi: 10.1186/1471-2164-15-52.

Abstract

Background: Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation's black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences.

Results: We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs.

Conclusion: Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte.

MeSH terms

  • Algorithms
  • Base Sequence
  • Color
  • Genomics / methods*
  • Internet
  • Nucleic Acids / chemistry*
  • Sequence Alignment
  • Software*
  • User-Computer Interface

Substances

  • Nucleic Acids