Ancestral Flowering Plant Chromosomes and Gene Orders Based on Generalized Adjacencies and Chromosomal Gene Co-Occurrences

J Comput Biol. 2021 Nov;28(11):1156-1179. doi: 10.1089/cmb.2021.0340.

Abstract

Recurrent whole genome duplication and the ensuing loss of redundant genes-fractionation-complicate efforts to reconstruct the gene orders and chromosomes of the ancestors associated with the nodes of a phylogeny. Loss of genes disrupts the gene adjacencies key to current techniques. With our RACCROCHE pipeline, instead of starting with the inference of short ancestral segments, we suggest delaying the choice of gene adjacencies while we accumulate many more syntenically validated generalized (gapped) adjacencies. We obtain longer ancestral contigs using maximum weight matching (MWM). Similarly, we do not construct chromosomes by successively piecing together contigs into larger segments, but rather compile counts of pairwise contig co-occurrences on the set of extant genomes and use these to cluster the contigs. Chromosome-level contig assemblies for a monoploid genome emerge naturally at each node of the phylogeny and the contigs then can be ordered along the chromosome. Sampling alternative MWM solutions, visualizing heat maps, and applying gap statistics allow us to estimate the number of chromosomes in the reconstruction. We introduce several measures of quality: length of contigs, continuity of contig structure on successive ancestors, coverage of the extant genome by the reconstruction, and rearrangement relations among the inferred chromosomes. The reconstructed ancestors are visualized by painting the ancestral projections on the descendant genomes. We submit genomes drawn from a broad range of monocot orders to our pipeline, confirming the tetraploidization event "tau" in the stem lineage between the alismatids and the lilioids. We show additional applications to the Solanaceae and to four Brassica genomes, producing evidence about the monoploid ancestor in each case.

Keywords: co-occurrence correlations; complete-link clustering; gap statistics; generalized adjacencies; genome reconstruction; maximum weight matching.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Evolution, Molecular
  • Gene Duplication*
  • Gene Order
  • Genome, Plant
  • Magnoliopsida / classification*
  • Magnoliopsida / genetics
  • Phylogeny