Measuring, visualizing and diagnosing reference bias with biastools

bioRxiv [Preprint]. 2024 Feb 15:2023.09.13.557552. doi: 10.1101/2023.09.13.557552.

Abstract

Many bioinformatics methods seek to reduce reference bias, but no methods exist to comprehensively measure it. Biastools analyzes and categorizes instances of reference bias. It works in various scenarios, i.e. (a) when the donor's variants are known and reads are simulated, (b) when donor variants are known and reads are real, and (c) when variants are unknown and reads are real. Using biastools, we observe that more inclusive graph genomes result in fewer biased sites. We find that end-to-end alignment reduces bias at indels relative to local aligners. Finally, we use biastools to characterize how T2T references improve large-scale bias.

Keywords: pangenomics; reference bias; sequence alignment.

Publication types

  • Preprint