SLINGER: large-scale learning for predicting gene expression

Sci Rep. 2016 Dec 20:6:39360. doi: 10.1038/srep39360.

Abstract

Recent studies have established that single nucleotide polymorphisms are sufficient to build accurate predictive models of gene expression. Gamazon, et al., found that gene expression values predicted from cis neighborhood SNPs show statistical association with disease status. In this work, we remove the cis neighborhood constraint during the learning process, and propose a novel predictive approach called SLINGER. We demonstrate that models drawing from a genome-wide set of SNPs are able to predict expression for more genes than the ones built on cis neighborhood only. Results indicate that these new models significantly improve accuracy for a large number of genes. Thanks to a penalized linear model, we also show that the number of features used in our models remains comparable to the cis-only models. Finally, SLINGER application on seven Wellcome Trust Case-Control Consortium genome-wide association studies demonstrate that compared to a cis-only approach, our models lead to associations with greater fidelity to actual gene expression values.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Gene Expression / genetics*
  • Gene Expression Regulation / genetics*
  • Genetic Predisposition to Disease / genetics*
  • Genome-Wide Association Study / methods*
  • Humans
  • Models, Theoretical*
  • Polymorphism, Single Nucleotide / genetics