Development and Validation of a Claims-Based Model to Predict Categories of Obesity

Am J Epidemiol. 2024 Jan 8;193(1):203-213. doi: 10.1093/aje/kwad178.

Abstract

We developed and validated a claims-based algorithm that classifies patients into obesity categories. Using Medicare (2007-2017) and Medicaid (2000-2014) claims data linked to 2 electronic health record (EHR) systems in Boston, Massachusetts, we identified a cohort of patients with an EHR-based body mass index (BMI) measurement (calculated as weight (kg)/height (m)2). We used regularized regression to select from 137 variables and built generalized linear models to classify patients with BMIs of ≥25, ≥30, and ≥40. We developed the prediction model using EHR system 1 (training set) and validated it in EHR system 2 (validation set). The cohort contained 123,432 patients in the Medicare population and 40,736 patients in the Medicaid population. The model comprised 97 variables in the Medicare set and 95 in the Medicaid set, including BMI-related diagnosis codes, cardiovascular and antidiabetic drugs, and obesity-related comorbidities. The areas under the receiver-operating-characteristic curve in the validation set were 0.72, 0.75, and 0.83 (Medicare) and 0.66, 0.66, and 0.70 (Medicaid) for BMIs of ≥25, ≥30, and ≥40, respectively. The positive predictive values were 81.5%, 80.6%, and 64.7% (Medicare) and 81.6%, 77.5%, and 62.5% (Medicaid), for BMIs of ≥25, ≥30, and ≥40, respectively. The proposed model can identify obesity categories in claims databases when BMI measurements are missing and can be used for confounding adjustment, defining subgroups, or probabilistic bias analysis.

Keywords: body mass index; machine learning; missing data; obesity; pharmacoepidemiology; prediction modeling.

MeSH terms

  • Aged
  • Body Mass Index
  • Comorbidity
  • Electronic Health Records
  • Humans
  • Hypoglycemic Agents
  • Medicare*
  • Obesity* / epidemiology
  • United States / epidemiology

Substances

  • Hypoglycemic Agents