Mapping the space of genomic signatures

PLoS One. 2015 May 22;10(5):e0119815. doi: 10.1371/journal.pone.0119815. eCollection 2015.

Abstract

We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • DNA, Mitochondrial / genetics*
  • Models, Theoretical*

Substances

  • DNA, Mitochondrial

Associated data

  • figshare/10.6084/M9.FIGSHARE.12​43376

Grants and funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada (http://www.nserc-crsng.gc.ca/index_eng.asp) Discovery Grant no. R2824A01 to LK; Natural Sciences and Engineering Research Council of Canada Discovery Grant no. R3511A12 to KAH; Natural Sciences and Engineering Research Council of Canada Undergraduate Student Research Award (http://www.nserc-crsng.gc.ca/students-etudiants/ug-pc/usra-brpc_eng.asp) to NB; Oxford University Press Clarendon Fund and Natural Sciences and Engineering Research Council of Canada Undergraduate Student Research Award to NSD. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.