Detection of False-Positive Deletions from the Database of Genomic Variants

Biomed Res Int. 2019 Apr 4:2019:8420547. doi: 10.1155/2019/8420547. eCollection 2019.

Abstract

Next generation sequencing is an emerging technology that has been widely used in the detection of genomic variants. However, since its depth of coverage, a main signature used for variant calling, is affected greatly by biases such as GC content and mappability, some callings are false positives. In this study, we utilized paired-end read mapping, another signature that is not affected by the aforementioned biases, to detect false-positive deletions in the database of genomic variants. We first identified 1923 suspicious variants that may be false positives and then conducted validation studies on each suspicious variant, which detected 583 false-positive deletions. Finally we analysed the distribution of these false positives by chromosome, sample, and size. Hopefully, incorrect documentation and annotations in downstream studies can be avoided by correcting these false positives in public repositories.

MeSH terms

  • Base Composition
  • Chromosome Mapping
  • Computational Biology / methods*
  • Databases, Genetic*
  • False Positive Reactions
  • Genetic Variation
  • Genome, Human / genetics*
  • Genomics*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Sequence Analysis, DNA / methods
  • Sequence Deletion*