Penalized integrative semiparametric interaction analysis for multiple genetic datasets

Stat Med. 2019 Jul 30;38(17):3221-3242. doi: 10.1002/sim.8172. Epub 2019 Apr 16.

Abstract

In this article, we consider a semiparametric additive partially linear interaction model for the integrative analysis of multiple genetic datasets. The goals are to identify important genetic predictors and gene-gene interactions and to estimate the nonparametric functions that describe the environmental effects at the same time. To find the similarities and differences of the genetic effects across different datasets, we impose a group structure on the regression coefficients matrix under the homogeneity assumption, ie, models for different datasets share the same sparsity structure, but the coefficients may differ across datasets. We develop an iterative approach to estimate the parameters of main effects, interactions and nonparametric functions, where a reparametrization of interaction parameters is implemented to meet the strong hierarchy assumption. We demonstrate the advantages of the proposed method in identification, estimation, and prediction in a series of numerical studies. We also apply the proposed method to the Skin Cutaneous Melanoma data and the lung cancer data from the Cancer Genome Atlas.

Keywords: Gene-gene interaction analysis; hierarchical constraint; integrative analysis; semiparametric model.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Epistasis, Genetic*
  • Humans
  • Lung Neoplasms / genetics
  • Melanoma / genetics
  • Models, Genetic*
  • Models, Statistical*
  • Skin Neoplasms / genetics