Incorporating family disease history and controlling case-control imbalance for population-based genetic association studies

Bioinformatics. 2022 Sep 15;38(18):4337-4343. doi: 10.1093/bioinformatics/btac459.

Abstract

Motivation: In the genome-wide association analysis of population-based biobanks, most diseases have low prevalence, which results in low detection power. One approach to tackle the problem is using family disease history, yet existing methods are unable to address type I error inflation induced by increased correlation of phenotypes among closely related samples, as well as unbalanced phenotypic distribution.

Results: We propose a new method for genetic association test with family disease history, mixed-model-based Test with Adjusted Phenotype and Empirical saddlepoint approximation, which controls for increased phenotype correlation by adopting a two-variance-component mixed model, accounts for case-control imbalance by using empirical saddlepoint approximation, and is flexible to incorporate any existing adjusted phenotypes, such as phenotypes from the LT-FH method. We show through simulation studies and analysis of UK Biobank data of white British samples and the Korean Genome and Epidemiology Study of Korean samples that the proposed method is robust and yields better calibration compared to existing methods while gaining power for detection of variant-phenotype associations.

Availability and implementation: The summary statistics and code generated in this study are available at https://github.com/styvon/TAPE.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies
  • Computer Simulation
  • Genome-Wide Association Study* / methods
  • Phenotype
  • Polymorphism, Single Nucleotide*