A powerful method to integrate genotype and gene expression data for dissecting the genetic architecture of a disease

Genomics. 2019 Dec;111(6):1387-1394. doi: 10.1016/j.ygeno.2018.09.011. Epub 2018 Oct 1.

Abstract

To decipher the genetic architecture of human disease, various types of omics data are generated. Two common omics data are genotypes and gene expression. Often genotype data for a large number of individuals and gene expression data for a few individuals are generated due to biological and technical reasons, leading to unequal sample sizes for different omics data. Unavailability of standard statistical procedure for integrating such datasets motivates us to propose a two-step multi-locus association method using latent variables. Our method is powerful than single/separate omics data analysis and it unravels comprehensively deep-seated signals through a single statistical model. Extensive simulation confirms that it is robust to various genetic models as its power increases with sample size and number of associated loci. It provides p-values very fast. Application to real dataset on psoriasis identifies 17 novel SNPs, functionally related to psoriasis-associated genes, at much smaller sample size than standard GWAS.

Keywords: Data integration; GWAS; Latent variable; Multi-locus association test.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies
  • Computer Simulation
  • Genome-Wide Association Study*
  • Genotype*
  • Humans
  • Models, Statistical*
  • Molecular Sequence Annotation
  • Phenotype
  • Polymorphism, Single Nucleotide*
  • Psoriasis / genetics*
  • Transcriptome*