Efficient testing and effect size estimation for set-based genetic association inference via semiparametric multilevel mixture modeling

Biom J. 2022 Aug;64(6):1142-1152. doi: 10.1002/bimj.202100234. Epub 2022 May 11.

Abstract

In genetic association studies, rare variants with extremely low allele frequencies play a crucial role in complex traits. Therefore, set-based testing methods that jointly assess the effects of groups of single nucleotide polymorphisms (SNPs) were developed to increase the powers of the association tests. However, these powers are still insufficient, and precise estimations of the effect sizes of individual SNPs are largely impossible. In this article, we provide an efficient set-based statistical inference framework that addresses both of these important issues simultaneously using an empirical Bayes method with semiparametric multilevel mixture modeling. We propose to utilize the hierarchical model that incorporates variations in set-specific effects and to apply the optimal discovery procedure (ODP) that achieves the largest overall power in multiple significance testing. In addition, we provide an optimal "set-based" estimator of the empirical distribution of effect sizes. The efficiency of the proposed methods is demonstrated through application to a genome-wide association study of coronary artery disease and through simulation studies. The results demonstrated numerous rare variants with large effect sizes for coronary artery disease, and the number of significant sets detected by the ODP was much greater than those identified by existing methods.

Keywords: effect size estimation; empirical Bayes; genome-wide association study; optimal discovery procedure.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Computer Simulation
  • Coronary Artery Disease* / genetics
  • Genome-Wide Association Study* / methods
  • Humans
  • Models, Genetic
  • Polymorphism, Single Nucleotide