Toward a more holistic method of genome assembly assessment

BMC Bioinformatics. 2020 Jul 6;21(Suppl 4):249. doi: 10.1186/s12859-020-3382-4.

Abstract

Background: A key use of high throughput sequencing technology is the sequencing and assembly of full genome sequences. These genome assemblies are commonly assessed using statistics relating to contiguity of the assembly. Measures of contiguity are not strongly correlated with information about the biological completion or correctness of the assembly, and a commonly reported metric, N50, can be misleading. Over the years, multiple research groups have rejected the overuse of N50 and sought to develop more informative metrics.

Results: This paper presents a review of problems that arise from relying solely on contiguity as a measure of genome assembly quality as well as current alternative methods. Alternative methods are compared on the basis of how informative they are about the biological quality of the assembly and how easy they are to use. A comprehensive method for using multiple metrics of measuring assembly quality is presented.

Conclusions: This study aims to report on the status of assembly assessment methods and compare them, as well as to offer a comprehensive method that incorporates multiple facets of quality assessment. Weaknesses and strengths of varying methods are presented and explained, with recommendations based on speed of analysis and user friendliness.

Keywords: Completeness; Contiguity; Correctness; Genome assembly; N50.

Publication types

  • Review

MeSH terms

  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans