miniBUSCO: a faster and more accurate reimplementation of BUSCO

bioRxiv [Preprint]. 2023 Jun 6:2023.06.03.543588. doi: 10.1101/2023.06.03.543588.

Abstract

Motivation: Assembly completeness evaluation of genome assembly is a critical assessment of the accuracy and reliability of genomic data. An incomplete assembly can lead to errors in gene predictions, annotation, and other downstream analyses. BUSCO is one of the most widely used tools for assessing the completeness of genome assembly by comparing the presence of a set of single-copy orthologs conserved across a wide range of taxa. However, the runtime of BUSCO can be long, particularly for some large genome assemblies. It is a challenge for researchers to quickly iterate the genome assemblies or analyze a large number of assemblies.

Results: Here, we present miniBUSCO, an efficient tool for assessing the completeness of genome assemblies. miniBUSCO utilizes the protein-to-genome aligner miniprot and the datasets of conserved orthologous genes from BUSCO. Our evaluation of the real human assembly indicates that miniBUSCO achieves a 14-fold speedup over BUSCO. Furthermore, miniBUSCO reports a more accurate completeness of 99.6% than BUSCO's completeness of 95.7%, which is in close agreement with the annotation completeness of 99.5% for T2T-CHM13.

Availability: https://github.com/huangnengCSU/minibusco .

Contact: hli@ds.dfci.harvard.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Preprint