VarSCAT: A computational tool for sequence context annotations of genomic variants

PLoS Comput Biol. 2023 Aug 11;19(8):e1010727. doi: 10.1371/journal.pcbi.1010727. eCollection 2023 Aug.

Abstract

The sequence contexts of genomic variants play important roles in understanding biological significances of variants and potential sequencing related variant calling issues. However, methods for assessing the diverse sequence contexts of genomic variants such as tandem repeats and unambiguous annotations have been limited. Herein, we describe the Variant Sequence Context Annotation Tool (VarSCAT) for annotating the sequence contexts of genomic variants, including breakpoint ambiguities, flanking bases of variants, wildtype/mutated DNA sequences, variant nomenclatures, distances between adjacent variants, tandem repeat regions, and custom annotation with user customizable options. Our analyses demonstrate that VarSCAT is more versatile and customizable than the currently available methods or strategies for annotating variants in short tandem repeat (STR) regions or insertions and deletions (indels) with breakpoint ambiguity. Variant sequence context annotations of high-confidence human variant sets with VarSCAT revealed that more than 75% of all human individual germline and clinically relevant indels have breakpoint ambiguities. Moreover, we illustrate that more than 80% of human individual germline small variants in STR regions are indels and that the sizes of these indels correlated with STR motif sizes. VarSCAT is available from https://github.com/elolab/VarSCAT.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genomics* / methods
  • High-Throughput Nucleotide Sequencing
  • Humans
  • INDEL Mutation* / genetics
  • Software

Grants and funding

This work is supported by Turku University Foundation (080559 to NW), University of Turku Graduate School (UTUGS) (to NW), the European Research Council ERC (677943 to LLE), European Union's Horizon 2020 research and innovation programme (955321 to LLE), Academy of Finland (310561, 314443, 329278, 335434, 335611 and 341342 to LLE), and Sigrid Juselius Foundation (to LLE). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.