The USDA cucumber (Cucumis sativus L.) collection: genetic diversity, population structure, genome-wide association studies, and core collection development

Hortic Res. 2018 Oct 1:5:64. doi: 10.1038/s41438-018-0080-8. eCollection 2018.

Abstract

Germplasm collections are a crucial resource to conserve natural genetic diversity and provide a source of novel traits essential for sustained crop improvement. Optimal collection, preservation and utilization of these materials depends upon knowledge of the genetic variation present within the collection. Here we use the high-throughput genotyping-by-sequencing (GBS) technology to characterize the United States National Plant Germplasm System (NPGS) collection of cucumber (Cucumis sativus L.). The GBS data, derived from 1234 cucumber accessions, provided more than 23 K high-quality single-nucleotide polymorphisms (SNPs) that are well distributed at high density in the genome (~1 SNP/10.6 kb). The SNP markers were used to characterize genetic diversity, population structure, phylogenetic relationships, linkage disequilibrium, and population differentiation of the NPGS cucumber collection. These results, providing detailed genetic analysis of the U.S. cucumber collection, complement NPGS descriptive information regarding geographic origin and phenotypic characterization. We also identified genome regions significantly associated with 13 horticulturally important traits through genome-wide association studies (GWAS). Finally, we developed a molecularly informed, publicly accessible core collection of 395 accessions that represents at least 96% of the genetic variation present in the NPGS. Collectively, the information obtained from the GBS data enabled deep insight into the diversity present and genetic relationships among accessions within the collection, and will provide a valuable resource for genetic analyses, gene discovery, crop improvement, and germplasm preservation.