A sample selection strategy for next-generation sequencing

Chul Joo Kang; Paul Marjoram

doi:10.1002/gepi.21664

A sample selection strategy for next-generation sequencing

Genet Epidemiol. 2012 Nov;36(7):696-709. doi: 10.1002/gepi.21664. Epub 2012 Aug 3.

Authors

Chul Joo Kang¹, Paul Marjoram

Affiliation

¹ Department of Preventive Medicine, Keck School of Medicine, USC, Los Angeles, California, USA.

Abstract

Next-generation sequencing technology provides us with vast amounts of sequence data. It is efficient and cheaper than previous sequencing technologies, but deep resequencing of entire samples is still expensive. Therefore, sensible strategies for choosing subsets of samples to sequence are required. Here we describe an algorithm for selection of a sub-sample of an existing sample if one has either of two possible goals in mind: maximizing the number of new polymorphic sites that are detected, or improving the efficiency with which the remaining unsequenced individuals can have their types imputed at newly discovered polymorphisms. We then describe a variation on our algorithm that is more focused on detecting rarer variants. We demonstrate the performance of our algorithm using simulated data and data from the 1000 Genomes Project.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Computer Simulation
Diploidy
Genetic Variation
Genome, Human*
Genome-Wide Association Study
Haplotypes
High-Throughput Nucleotide Sequencing
Human Genome Project
Humans
Models, Genetic
Polymorphism, Single Nucleotide*
Sequence Analysis, DNA / methods*

Abstract

Publication types

MeSH terms

Grants and funding