Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach

BMC Bioinformatics. 2021 Apr 8;22(1):181. doi: 10.1186/s12859-021-04090-y.

Abstract

Background: The widespread use of next-generation sequencing has identified an important role for somatic mosaicism in many diseases. However, detecting low-level mosaic variants from next-generation sequencing data remains challenging.

Results: Here, we present a method for Position-Based Variant Identification (PBVI) that uses empirically-derived distributions of alternate nucleotides from a control dataset. We modeled this approach on 11 segmental overgrowth genes. We show that this method improves detection of single nucleotide mosaic variants of 0.01-0.05 variant allele fraction compared to other low-level variant callers. At depths of 600 × and 1200 ×, we observed > 85% and > 95% sensitivity, respectively. In a cohort of 26 individuals with somatic overgrowth disorders PBVI showed improved signal to noise, identifying pathogenic variants in 17 individuals.

Conclusion: PBVI can facilitate identification of low-level mosaic variants thus increasing the utility of next-generation sequencing data for research and diagnostic purposes.

Keywords: Mosaic variants; Prediction of mosaic variants; Somatic overgrowth disorder.

MeSH terms

  • Alleles
  • Cohort Studies
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Nucleotides* / genetics
  • Software

Substances

  • Nucleotides