Improving the discovery of rare variants associated with alcohol problems by leveraging machine learning phenotype prediction and functional information

bioRxiv [Preprint]. 2023 Sep 15:2023.09.11.557163. doi: 10.1101/2023.09.11.557163.

Abstract

Alcohol use disorder (AUD) is moderately heritable with significant social and economic impact. Genome-wide association studies (GWAS) have identified common variants associated with AUD, however, rare variant investigations have yet to achieve well-powered sample sizes. In this study, we conducted an interval-based exome-wide analysis of the Alcohol Use Disorder Identification Test Problems subscale (AUDIT-P) using both machine learning (ML) predicted risk and empirical functional weights. This research has been conducted using the UK Biobank Resource (application number 30782.) Filtering the 200k exome release to unrelated individuals of European ancestry resulted in a sample of 147,386 individuals with 51,357 observed and 96,029 unmeasured but predicted AUDIT-P for exome analysis. Sequence Kernel Association Test (SKAT/SKAT-O) was used for rare variant (Minor Allele Frequency (MAF) < 0.01) interval analyses using default and empirical weights. Empirical weights were constructed using annotations found significant by stratified LD Score Regression analysis of predicted AUDIT-P GWAS, providing prior functional weights specific to AUDIT-P. Using only samples with observed AUDIT-P yielded no significantly associated intervals. In contrast, ADH1C and THRA gene intervals were significant (False discovery rate (FDR) <0.05) using default and empirical weights in the predicted AUDIT-P sample, with the most significant association found using predicted AUDIT-P and empirical weights in the ADH1C gene (SKAT-O P Default = 1.06 x 10 -9 and P Empirical weight = 6.25 x 10 -11 ). These findings provide evidence for rare variant association of the ADH1C gene with the AUDIT-P and highlight the successful leveraging of ML to increase effective sample size and prior empirical functional weights based on common variant GWAS data to refine and increase the statistical significance in underpowered phenotypes.

Publication types

  • Preprint