Confounded by sequencing depth in association studies of rare alleles

Chad Garner

doi:10.1002/gepi.20574

Confounded by sequencing depth in association studies of rare alleles

Genet Epidemiol. 2011 May;35(4):261-8. doi: 10.1002/gepi.20574.

Author

Chad Garner¹

Affiliation

¹ Department of Epidemiology, University of California, Irvine, CA 92697-3905, USA. cgarner@uci.edu

Abstract

Next-generation DNA sequencing technologies are facilitating large-scale association studies of rare genetic variants. The depth of the sequence read coverage is an important experimental variable in the next-generation technologies and it is a major determinant of the quality of genotype calls generated from sequence data. When case and control samples are sequenced separately or in different proportions across batches, they are unlikely to be matched on sequencing read depth and a differential misclassification of genotypes can result, causing confounding and an increased false-positive rate. Data from Pilot Study 3 of the 1000 Genomes project was used to demonstrate that a difference between the mean sequencing read depth of case and control samples can result in false-positive association for rare and uncommon variants, even when the mean coverage depth exceeds 30× in both groups. The degree of the confounding and inflation in the false-positive rate depended on the extent to which the mean depth was different in the case and control groups. A logistic regression model was used to test for association between case-control status and the cumulative number of alleles in a collapsed set of rare and uncommon variants. Including each individual's mean sequence read depth across the variant sites in the logistic regression model nearly eliminated the confounding effect and the inflated false-positive rate. Furthermore, accounting for the potential error by modeling the probability of the heterozygote genotype calls in the regression analysis had a relatively minor but beneficial effect on the statistical results.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Case-Control Studies
Confounding Factors, Epidemiologic
Gene Frequency
Genetic Variation*
Genome, Human
Genome-Wide Association Study*
Genotype
Heterozygote
High-Throughput Nucleotide Sequencing*
Humans
Models, Genetic
Pilot Projects
Polymorphism, Single Nucleotide
Probability
Regression Analysis
Sequence Analysis, DNA*

Abstract

Publication types

MeSH terms

Grants and funding