KnockoffTrio: A knockoff framework for the identification of putative causal variants in genome-wide association studies with trio design

Am J Hum Genet. 2022 Oct 6;109(10):1761-1776. doi: 10.1016/j.ajhg.2022.08.013. Epub 2022 Sep 22.

Abstract

Family-based designs can eliminate confounding due to population substructure and can distinguish direct from indirect genetic effects, but these designs are underpowered due to limited sample sizes. Here, we propose KnockoffTrio, a statistical method to identify putative causal genetic variants for father-mother-child trio design built upon a recently developed knockoff framework in statistics. KnockoffTrio controls the false discovery rate (FDR) in the presence of arbitrary correlations among tests and is less conservative and thus more powerful than the conventional methods that control the family-wise error rate via Bonferroni correction. Furthermore, KnockoffTrio is not restricted to family-based association tests and can be used in conjunction with more powerful, potentially nonlinear models to improve the power of standard family-based tests. We show, using empirical simulations, that KnockoffTrio can prioritize causal variants over associations due to linkage disequilibrium and can provide protection against confounding due to population stratification. In applications to 14,200 trios from three study cohorts for autism spectrum disorders (ASDs), including AGP, SPARK, and SSC, we show that KnockoffTrio can identify multiple significant associations that are missed by conventional tests applied to the same data. In particular, we replicate known ASD association signals with variants in several genes such as MACROD2, NRXN1, PRKAR1B, CADM2, PCDH9, and DOCK4 and identify additional associations with variants in other genes including ARHGEF10, SLC28A1, ZNF589, and HINT1 at FDR 10%.

Keywords: GWAS; causal variant identification; family-based design; knockoff framework.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Autism Spectrum Disorder* / genetics
  • Causality
  • Genome-Wide Association Study* / methods
  • Humans
  • Linkage Disequilibrium
  • Nerve Tissue Proteins / genetics

Substances

  • HINT1 protein, human
  • Nerve Tissue Proteins