Variable selection for random effects two-part models

Stat Methods Med Res. 2019 Sep;28(9):2697-2709. doi: 10.1177/0962280218784712. Epub 2018 Jul 13.

Abstract

Random effects two-part models have been applied to longitudinal studies for zero-inflated (or semi-continuous) data, characterized by a large portion of zero values and continuous non-zero (positive) values. Examples include monthly medical costs, daily alcohol drinks, relative abundance of microbiome, etc. With the advance of information technology for data collection and storage, the number of variables available to researchers can be rather large in such studies. To avoid curse of dimensionality and facilitate decision making, it is critically important to select covariates that are truly related to the outcome. However, owing to its intricate nature, there is not yet a satisfactory variable selection method available for such sophisticated models. In this paper, we seek a feasible way of conducting variable selection for random effects two-part models on the basis of the recently proposed "minimum information criterion" (MIC) method. We demonstrate that the MIC formulation leads to a reasonable formulation of sparse estimation, which can be conveniently solved with SAS Proc NLMIXED. The performance of our approach is evaluated through simulation, and an application to a longitudinal alcohol dependence study is provided.

Keywords: High dimensional; mixed effects; pharmacogenetics; precision medicine; tuning parameter; variable selection.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adult
  • Aged
  • Alcoholism / drug therapy*
  • Alcoholism / genetics*
  • Computer Simulation
  • Female
  • Humans
  • Longitudinal Studies
  • Male
  • Middle Aged
  • Models, Statistical*
  • Ondansetron / therapeutic use*
  • Pharmacogenetics*
  • Randomized Controlled Trials as Topic
  • Serotonin Antagonists / therapeutic use*

Substances

  • Serotonin Antagonists
  • Ondansetron