Effect of sequence depth and length in long-read assembly of the maize inbred NC358

Shujun Ou; Jianing Liu; Kapeel M Chougule; Arkarachai Fungtammasan; Arun S Seetharam; Joshua C Stein; Victor Llaca; Nancy Manchanda; Amanda M Gilbert; Sharon Wei; Chen-Shan Chin; David E Hufnagel; Sarah Pedersen; Samantha J Snodgrass; Kevin Fengler; Margaret Woodhouse; Brian P Walenz; Sergey Koren; Adam M Phillippy; Brett T Hannigan; R Kelly Dawe; Candice N Hirsch; Matthew B Hufford; Doreen Ware

doi:10.1038/s41467-020-16037-7

Effect of sequence depth and length in long-read assembly of the maize inbred NC358

Nat Commun. 2020 May 8;11(1):2288. doi: 10.1038/s41467-020-16037-7.

Authors

Shujun Ou¹, Jianing Liu², Kapeel M Chougule³, Arkarachai Fungtammasan⁴, Arun S Seetharam^{1

5}, Joshua C Stein³, Victor Llaca⁶, Nancy Manchanda¹, Amanda M Gilbert⁷, Sharon Wei³, Chen-Shan Chin⁴, David E Hufnagel¹, Sarah Pedersen¹, Samantha J Snodgrass¹, Kevin Fengler⁶, Margaret Woodhouse⁸, Brian P Walenz⁹, Sergey Koren⁹, Adam M Phillippy⁹, Brett T Hannigan⁴, R Kelly Dawe¹⁰, Candice N Hirsch¹¹, Matthew B Hufford¹², Doreen Ware^{13

14}

Affiliations

¹ Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011, USA.
² Department of Genetics, University of Georgia, Athens, Georgia, 30602, USA.
³ Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA.
⁴ DNAnexus, Inc., Mountain View, San Francisco, California, 94040, USA.
⁵ Genome Informatics Facility, Iowa State University, Ames, Iowa, 50011, USA.
⁶ Genomics Technologies, Applied Science and Technology, Corteva Agriscience TM, Johnston, Iowa, 50131, USA.
⁷ Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota, 55108, USA.
⁸ USDA ARS Corn Insects and Crop Genetics Research Unit, Ames, Iowa, 50011, USA.
⁹ Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, 20892, USA.
¹⁰ Department of Genetics, University of Georgia, Athens, Georgia, 30602, USA. kdawe@uga.edu.
¹¹ Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota, 55108, USA. cnhirsch@umn.edu.
¹² Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011, USA. mhufford@iastate.edu.
¹³ Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA. ware@cshl.edu.
¹⁴ USDA ARS Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, New York, 14853, USA. ware@cshl.edu.

Abstract

Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11-21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.

Publication types

Research Support, N.I.H., Intramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Base Sequence
DNA Transposable Elements / genetics
Genome, Plant
High-Throughput Nucleotide Sequencing / methods*
Inbreeding*
Repetitive Sequences, Nucleic Acid / genetics
Zea mays / genetics*

Substances

DNA Transposable Elements