UMIErrorCorrect and UMIAnalyzer: Software for Consensus Read Generation, Error Correction, and Visualization Using Unique Molecular Identifiers

Tobias Österlund; Stefan Filges; Gustav Johansson; Anders Ståhlberg

doi:10.1093/clinchem/hvac136

UMIErrorCorrect and UMIAnalyzer: Software for Consensus Read Generation, Error Correction, and Visualization Using Unique Molecular Identifiers

Clin Chem. 2022 Nov 3;68(11):1425-1435. doi: 10.1093/clinchem/hvac136.

Authors

Tobias Österlund^{1

2

3}, Stefan Filges³, Gustav Johansson^{2

3

4}, Anders Ståhlberg^{1

2

3}

Affiliations

¹ Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, Region Västra Götaland, Gothenburg, Sweden.
² Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden.
³ Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden.
⁴ SiMSen Diagnostics AB, Gothenburg, Sweden.

PMID: 36031761
DOI: 10.1093/clinchem/hvac136

Abstract

Background: Targeted sequencing using unique molecular identifiers (UMIs) enables detection of rare variant alleles in challenging applications, such as cell-free DNA analysis from liquid biopsies. Standard bioinformatics pipelines for data processing and variant calling are not adapted for deep-sequencing data containing UMIs, are inflexible, and require multistep workflows or dedicated computing resources.

Methods: We developed a bioinformatics pipeline using Python and an R package for data analysis and visualization. To validate our pipeline, we analyzed cell-free DNA reference material with known mutant allele frequencies (0%, 0.125%, 0.25%, and 1%) and public data sets.

Results: We developed UMIErrorCorrect, a bioinformatics pipeline for analyzing sequencing data containing UMIs. UMIErrorCorrect only requires fastq files as inputs and performs alignment, UMI clustering, error correction, and variant calling. We also provide UMIAnalyzer, a graphical user interface, for data mining, visualization, variant interpretation, and report generation. UMIAnalyzer allows the user to adjust analysis parameters and study their effect on variant calling. We demonstrated the flexibility of UMIErrorCorrect by analyzing data from 4 different targeted sequencing protocols. We also show its ability to detect different mutant allele frequencies in standardized cell-free DNA reference material. UMIErrorCorrect outperformed existing pipelines for targeted UMI sequencing data in terms of variant detection sensitivity.

Conclusions: UMIErrorCorrect and UMIAnalyzer are comprehensive and customizable bioinformatics tools that can be applied to any type of library preparation protocol and enrichment chemistry using UMIs. Access to simple, generic, and open-source bioinformatics tools will facilitate the implementation of UMI-based sequencing approaches in basic research and clinical applications.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cell-Free Nucleic Acids*
Consensus
High-Throughput Nucleotide Sequencing* / methods
Humans
Sequence Analysis, DNA / methods
Software

Substances

Cell-Free Nucleic Acids