Multimarker omnibus tests by leveraging individual marker summary statistics from large biobanks

Ann Hum Genet. 2023 May;87(3):125-136. doi: 10.1111/ahg.12495. Epub 2023 Jan 22.

Abstract

As biobanks become increasingly popular, access to genotypic and phenotypic data continues to increase in the form of precomputed summary statistics (PCSS). Widespread accessibility of PCSS alleviates many issues related to biobank data, including that of data privacy and confidentiality, as well as high computational costs. However, questions remain about how to maximally leverage PCSS for downstream statistical analyses. Here we present a novel method for testing the association of an arbitrary number of single nucleotide variants (SNVs) on a linear combination of phenotypes after adjusting for covariates for common multimarker tests (e.g., SKAT, SKAT-O) without access to individual patient-level data (IPD). We validate exact formulas for each method, and demonstrate their accuracy through simulation studies and an application to fatty acid phenotypic data from the Framingham Heart Study.

Keywords: genetic data banks; genetic markers; genetic privacy; genotype-phenotype associations; statistical.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Specimen Banks*
  • Genome-Wide Association Study*
  • Genotype
  • Humans
  • Phenotype
  • Polymorphism, Single Nucleotide