Statistical Association Mapping of Population-Structured Genetic Data

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):638-649. doi: 10.1109/TCBB.2017.2786239. Epub 2017 Dec 22.

Abstract

Association mapping of genetic diseases has attracted extensive research interest during the recent years. However, most of the methodologies introduced so far suffer from spurious inference of the associated sites due to population inhomogeneities. In this paper, we introduce a statistical framework to compensate for this shortcoming by equipping the current methodologies with a state-of-the-art clustering algorithm being widely used in population genetics applications. The proposed framework jointly infers the disease-associated factors and the hidden population structures. In this regard, a Markov Chain-Monte Carlo (MCMC) procedure has been employed to assess the posterior probability distribution of the model parameters. We have implemented our proposed framework on a software package whose performance is extensively evaluated on a number of synthetic datasets, and compared to some of the well-known existing methods such as STRUCTURE. It has been shown that in extreme scenarios, up to $10-15$10-15 percent of improvement in the inference accuracy is achieved with a moderate increase in computational complexity.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Computational Biology / methods*
  • Genetics, Population / methods*
  • Genome-Wide Association Study / methods*
  • Humans
  • Markov Chains
  • Models, Genetic
  • Models, Statistical*
  • Monte Carlo Method