Sequential adaptive variables and subject selection for GEE methods

Biometrics. 2020 Jun;76(2):496-507. doi: 10.1111/biom.13160. Epub 2019 Nov 11.

Abstract

Modeling correlated or highly stratified multiple-response data is a common data analysis task in many applications, such as those in large epidemiological studies or multisite cohort studies. The generalized estimating equations method is a popular statistical method used to analyze these kinds of data, because it can manage many types of unmeasured dependence among outcomes. Collecting large amounts of highly stratified or correlated response data is time-consuming; thus, the use of a more aggressive sampling strategy that can accelerate this process-such as the active-learning methods found in the machine-learning literature-will always be beneficial. In this study, we integrate adaptive sampling and variable selection features into a sequential procedure for modeling correlated response data. Besides reporting the statistical properties of the proposed procedure, we also use both synthesized and real data sets to demonstrate the usefulness of our method.

Keywords: active learning; adaptive sampling; generalized estimating equations; sequential estimation; stopping time.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Antibodies, Neutralizing / therapeutic use
  • Biometry / methods*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Databases, Factual / statistics & numerical data
  • Humans
  • Interferon beta-1b / therapeutic use
  • Logistic Models
  • Machine Learning
  • Models, Statistical*
  • Multiple Sclerosis, Relapsing-Remitting / immunology
  • Multiple Sclerosis, Relapsing-Remitting / therapy
  • Multivariate Analysis
  • Probability
  • Randomized Controlled Trials as Topic / statistics & numerical data
  • Sample Size

Substances

  • Antibodies, Neutralizing
  • Interferon beta-1b