Visualizing genomic data: The mixing perspective

Biosystems. 2023 Feb:224:104839. doi: 10.1016/j.biosystems.2023.104839. Epub 2023 Jan 20.

Abstract

We report on a novel way to visualize genomic data. By considering genome coding sequences, cds, as sets of the N=61 non-stop codons, one obtains a partition of the total number of codons in each cds. Partitions exhibit a statistical property known as mixing character which characterizes how mixed the partition is. Mixing characters have been shown mathematically to exhibit a partial order known as majorization (Ruch, 1975). In previous work (Seitz and Kirwan, 2022) we developed an approach that combined mixing and entropy that is visualized as a scatter plot. If we consider all 1,121,505 partitions of 61 codons, this produces a plot we call the theoretical mixing space, TGMS. A normalization procedure is developed here and applied to real genomic data to produce the genome mixing signature, GMS. Example GMS's of 19 species, including Homo sapiens, are shown and discussed.

Keywords: Genome models; Majorization; Mixing; Partial order; Randomness.

MeSH terms

  • Codon / genetics
  • Genomics*
  • Humans

Substances

  • Codon