Identification of putative causal loci in whole-genome sequencing data via knockoff statistics

Nat Commun. 2021 May 25;12(1):3152. doi: 10.1038/s41467-021-22889-4.

Abstract

The analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability, and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Causality
  • Computer Simulation
  • Data Interpretation, Statistical
  • Datasets as Topic
  • Genetic Loci
  • Genetic Predisposition to Disease*
  • Genome, Human
  • Genome-Wide Association Study / methods*
  • Humans
  • Linkage Disequilibrium
  • Markov Chains
  • Models, Genetic*
  • Polymorphism, Single Nucleotide
  • Reproducibility of Results
  • Whole Genome Sequencing / methods*

Grants and funding