Exploring the Link Between Additive Heritability and Prediction Accuracy From a Ridge Regression Perspective

Front Genet. 2020 Nov 4:11:581594. doi: 10.3389/fgene.2020.581594. eCollection 2020.

Abstract

Genome-Wide Association Studies (GWAS) explain only a small fraction of heritability for most complex human phenotypes. Genomic heritability estimates the variance explained by the SNPs on the whole genome using mixed models and accounts for the many small contributions of SNPs in the explanation of a phenotype. This paper approaches heritability from a machine learning perspective, and examines the close link between mixed models and ridge regression. Our contribution is two-fold. First, we propose estimating genomic heritability using a predictive approach via ridge regression and Generalized Cross Validation (GCV). We show that this is consistent with classical mixed model based estimation. Second, we derive simple formulae that express prediction accuracy as a function of the ratio n p , where n is the population size and p the total number of SNPs. These formulae clearly show that a high heritability does not imply an accurate prediction when p > n. Both the estimation of heritability via GCV and the prediction accuracy formulae are validated using simulated data and real data from UK Biobank.

Keywords: best linear unbiased predictor; generalized cross validation; heritability; mixed model; prediction accuracy; ridge regression.