Limitations of GCTA as a solution to the missing heritability problem

Proc Natl Acad Sci U S A. 2016 Jan 5;113(1):E61-70. doi: 10.1073/pnas.1520109113. Epub 2015 Dec 22.

Abstract

Genome-wide association studies (GWASs) seek to understand the relationship between complex phenotype(s) (e.g., height) and up to millions of single-nucleotide polymorphisms (SNPs). Early analyses of GWASs are commonly believed to have "missed" much of the additive genetic variance estimated from correlations between relatives. A more recent method, genome-wide complex trait analysis (GCTA), obtains much higher estimates of heritability using a model of random SNP effects correlated between genotypically similar individuals. GCTA has now been applied to many phenotypes from schizophrenia to scholastic achievement. However, recent studies question GCTA's estimates of heritability. Here, we show that GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability. We show first that GCTA depends sensitively on all singular values of a high-dimensional genetic relatedness matrix (GRM). When the assumptions in GCTA are satisfied exactly, we show that the heritability estimates produced by GCTA will be biased and the standard errors will likely be inaccurate. When the population is stratified, we find that GRMs typically have highly skewed singular values, and we prove that the many small singular values cannot be estimated reliably. Hence, GWAS data are necessarily overfit by GCTA which, as a result, produces high estimates of heritability. We also show that GCTA's heritability estimates are sensitive to the chosen sample and to measurement errors in the phenotype. We illustrate our results using the Framingham dataset. Our analysis suggests that results obtained using GCTA, and the results' qualitative interpretations, should be interpreted with great caution.

Keywords: GCTA; GWAS; SNP; heritability; singular value decomposition.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Data Interpretation, Statistical
  • Datasets as Topic / statistics & numerical data
  • Genome-Wide Association Study / methods*
  • Genome-Wide Association Study / statistics & numerical data*
  • Genotype
  • Humans
  • Phenotype
  • Polymorphism, Single Nucleotide*
  • Quantitative Trait, Heritable*