The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads

Plant J. 2012 Nov;72(3):461-73. doi: 10.1111/j.1365-313X.2012.05093.x. Epub 2012 Aug 14.

Abstract

Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Chromosome Mapping
  • Chromosomes, Artificial, Bacterial
  • Contig Mapping / methods*
  • DNA, Plant / chemistry
  • DNA, Plant / genetics
  • Expressed Sequence Tags
  • Flax / genetics*
  • Gene Library
  • Genome, Plant / genetics*
  • High-Throughput Nucleotide Sequencing
  • Molecular Sequence Annotation / methods*
  • Molecular Sequence Data
  • Protein Structure, Tertiary
  • Sequence Analysis, DNA

Substances

  • DNA, Plant

Associated data

  • GENBANK/HQ902252
  • GENBANK/JN133299
  • GENBANK/JN133300
  • GENBANK/JN133301
  • GENBANK/JX174444
  • GENBANK/JX174445
  • GENBANK/JX174446
  • GENBANK/JX174447
  • GENBANK/JX174448
  • GENBANK/JX174449