s-dePooler: determination of polymorphism carriers from overlapping DNA pools

BMC Bioinformatics. 2019 Jan 22;20(1):45. doi: 10.1186/s12859-019-2616-9.

Abstract

Background: Samples pooling is a method widely used in studies to reduce costs and labour. DNA sample pooling combined with massive parallel sequencing is a powerful tool for discovering DNA variants (polymorphisms) in large analysing populations, which is the base of such research fields as Genome-Wide Association Studies, evolutionary and population studies, etc. Usage of overlapping pools where each sample is present in multiple pools can enhance the accuracy of polymorphism detection and allow identifying carriers of rare-variants. Surprisingly there is a lack of tools for result interpretation and carrier identification, i.e. for "depooling".

Results: Here we present s-dePooler, the application for analysis of pooling experiments data. s-dePooler uses the variants information (VCF-file) and the pooling scheme to produce a list of candidate carriers for each polymorphism. We incorporated s-dePooler into a pipeline (dePoP) for automation of pooling analysis. The performance of the pipeline was tested on a synthetic dataset built using the 1000 Genomes Project data, resulting in the successful identification 97% of carriers of polymorphisms present in fewer than ~ 10% of carriers.

Conclusions: s-dePooler along with dePoP can be used to identify carriers of polymorphisms in overlapping pools, and is compatible with any pooling scheme with equivalent molar ratios of pooled samples. s-dePooler and dePoP with usage instructions and test data are freely available at https://github.com/lab9arriam/depop .

Keywords: DNA pools; Depooling; Overlapping pools; Polymorphism discovery; Sample pooling.

MeSH terms

  • DNA / genetics*
  • Genome-Wide Association Study / methods*
  • Humans
  • Polymorphism, Genetic / genetics*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA