Subsampling Technique to Estimate Variance Component for UK-Biobank Traits

Front Genet. 2021 Mar 5:12:612045. doi: 10.3389/fgene.2021.612045. eCollection 2021.

Abstract

The estimation of heritability has been an important question in statistical genetics. Due to the clear mathematical properties, the modified Haseman-Elston regression has been found a bridge that connects and develops various parallel heritability estimation methods. With the increasing sample size, estimating heritability for biobank-scale data poses a challenge for statistical computation, in particular that the calculation of the genetic relationship matrix is a huge challenge in statistical computation. Using the Haseman-Elston framework, in this study we explicitly analyzed the mathematical structure of the key term tr( K T K ), the trace of high-order term of the genetic relationship matrix, a component involved in the estimation procedure. In this study, we proposed two estimators, which can estimate tr( K T K ) with greatly reduced sampling variance compared to the existing method under the same computational complexity. We applied this method to 81 traits in UK Biobank data and compared the chromosome-wise partition heritability with the whole-genome heritability, also as an approach for testing polygenicity.

Keywords: Haseman-Elston regression; UK Biobank; effective number of markers; polygenicity; subsampling estimator.