Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions

Jason Westra; Nicholas Hartman; Bethany Lake; Gregory Shearer; Nathan Tintle

Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions

Pac Symp Biocomput. 2018:23:496-506.

Authors

Jason Westra¹, Nicholas Hartman, Bethany Lake, Gregory Shearer, Nathan Tintle

Affiliation

¹ Department of Statistics, Iowa State University Ames, IA 50011, United States, ²Department of Mathematics, Statistics, and Computer Science, Dordt College Sioux Center, IA 51250, United States, jwestra@iastate.edu.

PMID: 29218908
PMCID: PMC5757879

Abstract

Standard approaches to evaluate the impact of single nucleotide polymorphisms (SNP) on quantitative phenotypes use linear models. However, these normal-based approaches may not optimally model phenotypes which are better represented by Gaussian mixture distributions (e.g., some metabolomics data). We develop a likelihood ratio test on the mixing proportions of two-component Gaussian mixture distributions and consider more restrictive models to increase power in light of a priori biological knowledge. Data were simulated to validate the improved power of the likelihood ratio test and the restricted likelihood ratio test over a linear model and a log transformed linear model. Then, using real data from the Framingham Heart Study, we analyzed 20,315 SNPs on chromosome 11, demonstrating that the proposed likelihood ratio test identifies SNPs well known to participate in the desaturation of certain fatty acids. Our study both validates the approach of increasing power by using the likelihood ratio test that leverages Gaussian mixture models, and creates a model with improved sensitivity and interpretability.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Chromosomes, Human, Pair 11 / genetics
Computational Biology / methods
Computer Simulation
Fatty Acids / metabolism
Genetic Association Studies / statistics & numerical data
Genome-Wide Association Study / statistics & numerical data
Genotype
Humans
Likelihood Functions
Linear Models
Metabolome / genetics*
Metabolomics / statistics & numerical data*
Models, Genetic
Normal Distribution
Polymorphism, Single Nucleotide

Substances

Fatty Acids

Grants and funding

R15 HG006915/HG/NHGRI NIH HHS/United States