Comparative evaluation of the heterozygous variant standard deviation as a quality measure for next-generation sequencing

J Biomed Inform. 2022 Nov:135:104234. doi: 10.1016/j.jbi.2022.104234. Epub 2022 Oct 22.

Abstract

Next-generation sequencing holds unprecedented throughput in terms of informational content to cost. The technology has entered the scene in laboratory diagnostics and offers flexible workflows in biomedical research. However, the rapid acquisition of genomic data also gives rise to a substantial fraction of sequencing artifacts, causing the detection of false-positive germline variants or erroneous somatic mutations. Consequently, there is a pressing need for efficient and practical quality assessment in sequencing projects. In this study, we investigate using heterozygous variant allele frequency (VAF) standard deviation (σ) for supplementary quality control. Whereas several proposed quality metrics are based on empirical assessments, the dispersion of the allele frequencies reflects a direct approximation of the inherent and discrete features of a diploid genome. Consequently, homologous chromosomes display heterozygous VAF of approximately 1/2. Based on the meta-analysis of 152 whole-exome sequencing data sets, we found that σ reflects both sequencing coverage and noise and can be effectively modeled. It is concluded that the relative comparison of heterozygous VAF σ provides a practical handle for quality assessment, even for samples afflicted with copy-number alterations. The approach can be implemented when performing whole-exome, whole-genome, or targeted panel sequencing and helps identify problematic samples, such as those retrieved from archived formalin-fixed paraffin-embedded tissue.

Publication types

  • Meta-Analysis
  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Copy Number Variations
  • Exome
  • Genomics
  • High-Throughput Nucleotide Sequencing*
  • Mutation
  • Quality Indicators, Health Care*