RDscan: A New Method for Improving Germline and Somatic Variant Calling Based on Read Depth Distribution

J Comput Biol. 2022 Sep;29(9):987-1000. doi: 10.1089/cmb.2021.0269. Epub 2022 Jun 24.

Abstract

Several tools have been developed for calling variants from next-generation sequencing (NGS) data. Although they are generally accurate and reliable, most of them have room for improvement, especially regarding calling variants in datasets with low read depth. In addition, the somatic variants predicted by several somatic variant callers tend to have very low concordance rates. In this study, we developed a new method (RDscan) for improving germline and somatic variant calling in NGS data. RDscan removes misaligned reads, repositions reads, and calculates RDscore based on the read depth distribution. With RDscore, RDscan improves the precision of variant callers by removing false-positive variant calls. When we tested our new tool using the latest variant calling algorithms and data from the 1000 Genomes Project and Illumina's public datasets, accuracy was improved for most of the algorithms. After screening variants with RDscan, calling accuracies increased for germline variants in 11 of 12 cases and for somatic variants in 21 of 24 cases. RDscan is simple to use and can effectively remove false-positive variants while maintaining a low computation load. Therefore, RDscan, along with existing variant callers, should contribute to improvements in genome analysis.

Keywords: germline variant; next-generation sequencing; read depth distribution; somatic variant; variant calling; variant filtering.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Germ Cells
  • High-Throughput Nucleotide Sequencing* / methods
  • Polymorphism, Single Nucleotide
  • Software