Incorporating family disease history and controlling case-control imbalance for population-based genetic association studies

Yongwen Zhuang; Brooke N Wolford; Kisung Nam; Wenjian Bi; Wei Zhou; Cristen J Willer; Bhramar Mukherjee; Seunggeun Lee

doi:10.1093/bioinformatics/btac459

Incorporating family disease history and controlling case-control imbalance for population-based genetic association studies

Bioinformatics. 2022 Sep 15;38(18):4337-4343. doi: 10.1093/bioinformatics/btac459.

Authors

Yongwen Zhuang^{1

2}, Brooke N Wolford³, Kisung Nam⁴, Wenjian Bi⁵, Wei Zhou⁶, Cristen J Willer^{3

7

8}, Bhramar Mukherjee^{2

9

10}, Seunggeun Lee⁴

Affiliations

¹ Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
² Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
³ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
⁴ Graduate School of Data Science, Seoul National University, Seoul, Korea.
⁵ Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China.
⁶ Massachusetts General Hospital, Broad Institute, Boston, MA, USA.
⁷ Department of Internal Medicine, Division of Cardiology, University of Michigan Medical School, Ann Arbor, MI, USA.
⁸ Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, USA.
⁹ Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA.
¹⁰ Michigan Institute of Data Science, University of Michigan, Ann Arbor, MI, USA.

Abstract

Motivation: In the genome-wide association analysis of population-based biobanks, most diseases have low prevalence, which results in low detection power. One approach to tackle the problem is using family disease history, yet existing methods are unable to address type I error inflation induced by increased correlation of phenotypes among closely related samples, as well as unbalanced phenotypic distribution.

Results: We propose a new method for genetic association test with family disease history, mixed-model-based Test with Adjusted Phenotype and Empirical saddlepoint approximation, which controls for increased phenotype correlation by adopting a two-variance-component mixed model, accounts for case-control imbalance by using empirical saddlepoint approximation, and is flexible to incorporate any existing adjusted phenotypes, such as phenotypes from the LT-FH method. We show through simulation studies and analysis of UK Biobank data of white British samples and the Korean Genome and Epidemiology Study of Korean samples that the proposed method is robust and yields better calibration compared to existing methods while gaining power for detection of variant-phenotype associations.

Availability and implementation: The summary statistics and code generated in this study are available at https://github.com/styvon/TAPE.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Case-Control Studies
Computer Simulation
Genome-Wide Association Study* / methods
Phenotype
Polymorphism, Single Nucleotide*

Abstract

Publication types

MeSH terms

Grants and funding