Assembly and annotation of a draft genome sequence for Glycine latifolia, a perennial wild relative of soybean

Plant J. 2018 Jul;95(1):71-85. doi: 10.1111/tpj.13931. Epub 2018 May 23.

Abstract

Glycine latifolia (Benth.) Newell & Hymowitz (2n = 40), one of the 27 wild perennial relatives of soybean, possesses genetic diversity and agronomically favorable traits that are lacking in soybean. Here, we report the 939-Mb draft genome assembly of G. latifolia (PI 559298) using exclusively linked-reads sequenced from a single Chromium library. We organized scaffolds into 20 chromosome-scale pseudomolecules utilizing two genetic maps and the Glycine max (L.) Merr. genome sequence. High copy numbers of putative 91-bp centromere-specific tandem repeats were observed in consecutive blocks within predicted pericentromeric regions on several pseudomolecules. No 92-bp putative centromeric repeats, which are abundant in G. max, were detected in G. latifolia or Glycine tomentella. Annotation of the assembled genome and subsequent filtering yielded a high confidence gene set of 54 475 protein-coding loci. In comparative analysis with five legume species, genes related to defense responses were significantly overrepresented in Glycine-specific orthologous gene families. A total of 304 putative nucleotide-binding site (NBS)-leucine-rich-repeat (LRR) genes were identified in this genome assembly. Different from other legume species, we observed a scarcity of TIR-NBS-LRR genes in G. latifolia. The G. latifolia genome was also predicted to contain genes encoding 367 LRR-receptor-like kinases, a family of proteins involved in basal defense responses and responses to abiotic stress. The genome sequence and annotation of G. latifolia provides a valuable source of alternative alleles and novel genes to facilitate soybean improvement. This study also highlights the efficacy and cost-effectiveness of the application of Chromium linked-reads in diploid plant genome de novo assembly.

Keywords: Glycine latifolia; 10X Genomics; disease resistance; genome sequence; soybean; wild perennial relative.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Centromere / genetics
  • Chromosome Mapping
  • Chromosomes, Plant / genetics
  • Disease Resistance / genetics
  • Genes, Plant / genetics
  • Genome, Plant / genetics*
  • Glycine / genetics*
  • Sequence Analysis, DNA
  • Tandem Repeat Sequences / genetics

Substances

  • Glycine