HaplotagLR: An efficient and configurable utility for haplotagging long reads

Monica J Holmes; Babak Mahjour; Christopher P Castro; Gregory A Farnum; Adam G Diehl; Alan P Boyle

doi:10.1371/journal.pone.0298688

HaplotagLR: An efficient and configurable utility for haplotagging long reads

PLoS One. 2024 Mar 13;19(3):e0298688. doi: 10.1371/journal.pone.0298688. eCollection 2024.

Authors

Monica J Holmes¹, Babak Mahjour², Christopher P Castro¹, Gregory A Farnum¹, Adam G Diehl¹, Alan P Boyle^{1

3}

Affiliations

¹ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America.
² Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America.
³ Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, United States of America.

Abstract

Understanding the functional effects of sequence variation is crucial in genomics. Individual human genomes contain millions of variants that contribute to phenotypic variability and disease risks at the population level. Because variants rarely act in isolation, we must consider potential interactions of neighboring variants to accurately predict functional effects. We can accomplish this using haplotagging, which matches sequencing reads to their parental haplotypes using alleles observed at known heterozygous variants. However, few published tools for haplotagging exist and these share several technical and usability-related shortcomings that limit applicability, in particular a lack of insight or control over error rates, and lack of key metrics on the underlying sources of haplotagging error. Here we present HaplotagLR: a user-friendly tool that haplotags long sequencing reads based on a multinomial model and existing phased variant lists. HaplotagLR is user-configurable and includes a basic error model to control the empirical FDR in its output. We show that HaplotagLR outperforms the leading haplotagging method in simulated datasets, especially at high levels of specificity, and displays 7% greater sensitivity in haplotagging real data. HaplotagLR advances both the immediate utility of haplotagging and paves the way for further improvements to this important method.

Copyright: © 2024 Holmes et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Algorithms
Genome, Human*
Genomics* / methods
Haplotypes / genetics
High-Throughput Nucleotide Sequencing / methods
Humans
Sequence Analysis, DNA / methods

Abstract

MeSH terms

Grants and funding