Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies

Biometrics. 2015 Dec;71(4):1111-20. doi: 10.1111/biom.12331. Epub 2015 Jun 1.

Abstract

Recent developments of high-throughput genomic technologies offer an unprecedented detailed view of the genetic variation in various human populations, and promise to lead to significant progress in understanding the genetic basis of complex diseases. Despite this tremendous advance in data generation, it remains very challenging to analyze and interpret these data due to their sparse and high-dimensional nature. Here, we propose novel applications and new developments of empirical Bayes scan statistics to identify genomic regions significantly enriched with disease risk variants. We show that the proposed empirical Bayes methodology can be substantially more powerful than existing scan statistics methods especially so in the presence of many non-disease risk variants, and in situations when there is a mixture of risk and protective variants. Furthermore, the empirical Bayes approach has greater flexibility to accommodate covariates such as functional prediction scores and additional biomarkers. As proof-of-concept we apply the proposed methods to a whole-exome sequencing study for autism spectrum disorders and identify several promising candidate genes.

Keywords: Empirical Bayes; Next-generation sequencing; Rare variants; Scan statistics.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Autism Spectrum Disorder / genetics
  • Bayes Theorem*
  • Biometry / methods
  • Cluster Analysis
  • Computer Simulation
  • Databases, Genetic / statistics & numerical data
  • Genetic Association Studies / statistics & numerical data*
  • Genetic Predisposition to Disease
  • Genetic Variation*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Likelihood Functions
  • Models, Statistical
  • Multigene Family
  • Risk Factors