Are We There Yet? Reliably Estimating the Completeness of Plant Genome Sequences

Elisabeth Veeckman; Tom Ruttink; Klaas Vandepoele

doi:10.1105/tpc.16.00349

Are We There Yet? Reliably Estimating the Completeness of Plant Genome Sequences

Plant Cell. 2016 Aug;28(8):1759-68. doi: 10.1105/tpc.16.00349. Epub 2016 Aug 10.

Authors

Elisabeth Veeckman¹, Tom Ruttink¹, Klaas Vandepoele²

Affiliations

¹ Institute for Agricultural and Fisheries Research, Plant Sciences Unit, Growth and Development, B-9090 Melle, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium.
² Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium Department of Plant Systems Biology, VIB, Technologiepark 927, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium klaas.vandepoele@psb.vib-ugent.be.

Abstract

Genome sequencing is becoming cheaper and faster thanks to the introduction of next-generation sequencing techniques. Dozens of new plant genome sequences have been released in recent years, ranging from small to gigantic repeat-rich or polyploid genomes. Most genome projects have a dual purpose: delivering a contiguous, complete genome assembly and creating a full catalog of correctly predicted genes. Frequently, the completeness of a species' gene catalog is measured using a set of marker genes that are expected to be present. This expectation can be defined along an evolutionary gradient, ranging from highly conserved genes to species-specific genes. Large-scale population resequencing studies have revealed that gene space is fairly variable even between closely related individuals, which limits the definition of the expected gene space, and, consequently, the accuracy of estimates used to assess genome and gene space completeness. We argue that, based on the desired applications of a genome sequencing project, different completeness scores for the genome assembly and/or gene space should be determined. Using examples from several dicot and monocot genomes, we outline some pitfalls and recommendations regarding methods to estimate completeness during different steps of genome assembly and annotation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Chromosome Mapping
Genome, Plant / genetics*
High-Throughput Nucleotide Sequencing / methods*
Sequence Analysis, DNA / methods*