Evaluating Genome Assemblies and Gene Models Using gVolante

Methods Mol Biol. 2019:1962:247-256. doi: 10.1007/978-1-4939-9173-0_15.

Abstract

In daily practice of de novo genome assembly and gene prediction, it would be a natural urge to evaluate their products. Different programs and parameter settings give rise to variable outputs, which leaves a decision of which output to adopt for downstream analysis for addressing biological questions. Instead of superficial assessment of length-based statistics of output sequences (e.g., N50 scaffold length), completeness assessment by means of scoring the coverage of reference orthologs has been increasingly utilized.We previously launched a web service, gVolante ( https://gvolante.riken.jp /), to provide a user-friendly interface and a uniform environment for completeness assessment with the pipelines CEGMA and BUSCO. Completeness assessments performed on gVolante report scores based on not just the coverage of reference genes but also on sequence lengths, allowing quality control in multiple aspects. This chapter focuses on the procedure for such assessment and provides technical tips for higher accuracy.

Keywords: BUSCO; CEGMA; CVG; Completeness assessment; Ortholog.

MeSH terms

  • Animals
  • Elephants / genetics
  • Genome*
  • Genomics / methods*
  • Models, Genetic*
  • Rodentia / genetics
  • Software*
  • User-Computer Interface