Testing for associations with missing high-dimensional categorical covariates

Int J Biostat. 2008 Sep 29;4(1):Article 18. doi: 10.2202/1557-4679.1102.

Abstract

Understanding how long-term clinical outcomes relate to short-term response to therapy is an important topic of research with a variety of applications. In HIV, early measures of viral RNA levels are known to be a strong prognostic indicator of future viral load response. However, mutations observed in the high-dimensional viral genotype at an early time point may change this prognosis. Unfortunately, some subjects may not have a viral genetic sequence measured at the early time point, and the sequence may be missing for reasons related to the outcome. Complete-case analyses of missing data are generally biased when the assumption that data are missing completely at random is not met, and methods incorporating multiple imputation may not be well-suited for the analysis of high-dimensional data. We propose a semiparametric multiple testing approach to the problem of identifying associations between potentially missing high-dimensional covariates and response. Following the recent exposition by Tsiatis, unbiased nonparametric summary statistics are constructed by inversely weighting the complete cases according to the conditional probability of being observed, given data that is observed for each subject. Resulting summary statistics will be unbiased under the assumption of missing at random. We illustrate our approach through an application to data from a recent AIDS clinical trial, and demonstrate finite sample properties with simulations.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Antiretroviral Therapy, Highly Active
  • Bias
  • Biostatistics / methods*
  • Data Interpretation, Statistical
  • Genotype
  • HIV Infections / drug therapy
  • HIV Infections / virology
  • HIV-1 / genetics
  • Humans
  • Models, Statistical
  • Mutation
  • Prognosis
  • RNA, Viral / blood
  • RNA, Viral / genetics
  • Randomized Controlled Trials as Topic / statistics & numerical data
  • Statistics, Nonparametric
  • Treatment Outcome

Substances

  • RNA, Viral