Cotton pan-genome retrieves the lost sequences and genes during domestication and selection

Genome Biol. 2021 Apr 23;22(1):119. doi: 10.1186/s13059-021-02351-w.

Abstract

Background: Millennia of directional human selection has reshaped the genomic architecture of cultivated cotton relative to wild counterparts, but we have limited understanding of the selective retention and fractionation of genomic components.

Results: We construct a comprehensive genomic variome based on 1961 cottons and identify 456 Mb and 357 Mb of sequence with domestication and improvement selection signals and 162 loci, 84 of which are novel, including 47 loci associated with 16 agronomic traits. Using pan-genome analyses, we identify 32,569 and 8851 non-reference genes lost from Gossypium hirsutum and Gossypium barbadense reference genomes respectively, of which 38.2% (39,278) and 14.2% (11,359) of genes exhibit presence/absence variation (PAV). We document the landscape of PAV selection accompanied by asymmetric gene gain and loss and identify 124 PAVs linked to favorable fiber quality and yield loci.

Conclusions: This variation repertoire points to genomic divergence during cotton domestication and improvement, which informs the characterization of favorable gene alleles for improved breeding practice using a pan-genome-based approach.

Keywords: Copy number variation (CNV); Cotton; Domestication; Gene loss; Improvement; Pan-genome; Presence/absence variation (PAV).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Copy Number Variations
  • Domestication*
  • Genes, Plant*
  • Genetic Variation
  • Genetics, Population
  • Genome, Plant*
  • Genome-Wide Association Study
  • Genomics* / methods
  • Gossypium / genetics*
  • INDEL Mutation
  • Phenotype
  • Plant Breeding
  • Polymorphism, Single Nucleotide
  • Selection, Genetic*