A genome-wide approach for detecting novel insertion-deletion variants of mid-range size

Nucleic Acids Res. 2016 Sep 6;44(15):e126. doi: 10.1093/nar/gkw481. Epub 2016 Jun 20.

Abstract

We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural

MeSH terms

  • Adenoviridae / genetics
  • Algorithms
  • Animals
  • Benchmarking
  • Computer Simulation
  • DNA Mutational Analysis / methods*
  • Datasets as Topic
  • Genome / genetics*
  • Genomics / methods*
  • INDEL Mutation / genetics*
  • Pan troglodytes / virology
  • Poisson Distribution
  • Reproducibility of Results