How Confounder Strength Can Affect Allocation of Resources in Electronic Health Records

Perspect Health Inf Manag. 2018 Jan 1;15(Winter):1d. eCollection 2018 Winter.

Abstract

When electronic health record (EHR) data are used, multiple approaches may be available for measuring the same variable, introducing potentially confounding factors. While additional information may be gleaned and residual confounding reduced through resource-intensive assessment methods such as natural language processing (NLP), whether the added benefits offset the added cost of the additional resources is not straightforward. We evaluated the implications of misclassification of a confounder when using EHRs. Using a combination of simulations and real data surrounding hospital readmission, we considered smoking as a potential confounder. We compared ICD-9 diagnostic code assignment, which is an easily available measure but has the possibility of substantial misclassification of smoking status, with NLP, a method of determining smoking status that more expensive and time-consuming than ICD-9 code assignment but has less potential for misclassification. Classification of smoking status with NLP consistently produced less residual confounding than the use of ICD-9 codes; however, when minimal confounding was present, differences between the approaches were small. When considerable confounding is present, investing in a superior measurement tool becomes advantageous.

Keywords: confounding; electronic health records; natural language processing.

MeSH terms

  • Algorithms
  • Confounding Factors, Epidemiologic*
  • Data Accuracy*
  • Electronic Health Records / statistics & numerical data*
  • Humans
  • Natural Language Processing*
  • Patient Readmission / statistics & numerical data
  • Smoking / epidemiology