Adjusting for differential misclassification in matched case-control studies utilizing health administrative data

Stat Med. 2019 Aug 30;38(19):3669-3681. doi: 10.1002/sim.8203. Epub 2019 May 21.

Abstract

In epidemiological studies of secondary data sources, lack of accurate disease classifications often requires investigators to rely on diagnostic codes generated by physicians or hospital systems to identify case and control groups, resulting in a less-than-perfect assessment of the disease under investigation. Moreover, because of differences in coding practices by physicians, it is hard to determine the factors that affect the chance of an incorrectly assigned disease status. What results is a dilemma where assumptions of non-differential misclassification are questionable but, at the same time, necessary to proceed with statistical analyses. This paper develops an approach to adjust exposure-disease association estimates for disease misclassification, without the need of simplifying non-differentiality assumptions, or prior information about a complicated classification mechanism. We propose to leverage rich temporal information on disease-specific healthcare utilization to estimate each participant's probability of being a true case and to use these estimates as weights in a Bayesian analysis of matched case-control data. The approach is applied to data from a recent observational study into the early symptoms of multiple sclerosis (MS), where MS cases were identified from Canadian health administrative databases and matched to population controls that are assumed to be correctly classified. A comparison of our results with those from non-differentially adjusted analyses reveals conflicting inferences and highlights that ill-suited assumptions of non-differential misclassification can exacerbate biases in association estimates.

Keywords: Bayesian method; differential misclassification; disease misclassification; health administrative databases; matched case-control study.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem*
  • Bias*
  • Case-Control Studies
  • Clinical Coding
  • Data Accuracy*
  • Databases, Factual
  • Diagnostic Errors*
  • Hospitals
  • Humans
  • Models, Statistical