The accuracy of LD Score regression as an estimator of confounding and genetic correlations in genome-wide association studies

Genet Epidemiol. 2018 Dec;42(8):783-795. doi: 10.1002/gepi.22161. Epub 2018 Sep 24.

Abstract

To infer that a single-nucleotide polymorphism (SNP) either affects a phenotype or is linkage disequilibrium with a causal site, we must have some assurance that any SNP-phenotype correlation is not the result of confounding with environmental variables that also affect the trait. In this study, we study the properties of linkage disequilibrium (LD) Score regression, a recently developed method for using summary statistics from genome-wide association studies to ensure that confounding does not inflate the number of false positives. We do not treat the effects of genetic variation as a random variable and thus are able to obtain results about the unbiasedness of this method. We demonstrate that LD Score regression can produce estimates of confounding at null SNPs that are unbiased or conservative under fairly general conditions. This robustness holds in the case of the parent genotype affecting the offspring phenotype through some environmental mechanism, despite the resulting correlation over SNPs between LD Scores and the degree of confounding. Additionally, we demonstrate that LD Score regression can produce reasonably robust estimates of the genetic correlation, even when its estimates of the genetic covariance and the two univariate heritabilities are substantially biased.

Keywords: causal inference; genetic correlation; heritability; population stratification; quantitative genetics.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Computer Simulation
  • Confounding Factors, Epidemiologic*
  • Genome-Wide Association Study*
  • Genotype
  • Humans
  • Inheritance Patterns / genetics
  • Linkage Disequilibrium / genetics*
  • Models, Genetic
  • Phenotype
  • Polymorphism, Single Nucleotide / genetics
  • Regression Analysis
  • Twins / genetics