A Bayesian method for estimating gene-level polygenicity under the framework of transcriptome-wide association study

Arunabha Majumdar; Bogdan Pasaniuc

doi:10.1002/sim.9892

A Bayesian method for estimating gene-level polygenicity under the framework of transcriptome-wide association study

Stat Med. 2023 Nov 20;42(26):4867-4885. doi: 10.1002/sim.9892. Epub 2023 Aug 29.

Authors

Arunabha Majumdar¹, Bogdan Pasaniuc²

Affiliations

¹ Department of Mathematics, Indian Institute of Technology Hyderabad, Kandi, Telangana, India.
² Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, California.

PMID: 37643728
DOI: 10.1002/sim.9892

Abstract

Polygenicity refers to the phenomenon that multiple genetic variants have a nonzero effect on a complex trait. It is defined as the proportion of genetic variants with a nonzero effect on the trait. Evaluation of polygenicity can provide valuable insights into the genetic architecture of the trait. Several recent works have attempted to estimate polygenicity at the single nucleotide polymorphism level. However, evaluating polygenicity at the gene level can be biologically more meaningful. We propose the notion of gene-level polygenicity, defined as the proportion of genes having a nonzero effect on the trait under the framework of a transcriptome-wide association study. We introduce a Bayesian approach genepoly to estimate this quantity for a trait. The method is based on spike and slab prior and simultaneously estimates the subset of non-null genes. Our simulation study shows that genepoly efficiently estimates gene-level polygenicity. The method produces a downward bias for small choices of trait heritability due to a non-null gene, which diminishes rapidly with an increase in the genome-wide association study (GWAS) sample size. While identifying the subset of non-null genes, genepoly offers a high level of specificity and an overall good level of sensitivity-the sensitivity increases as the sample size of the reference panel expression and GWAS data increase. We applied the method to seven phenotypes in the UK Biobank, integrating expression data. We find height to be the most polygenic and asthma to be the least polygenic.

Keywords: MCMC; complex trait genomics; gene-level association; hierarchical models; spike and slab prior.

Abstract

Grants and funding