Easy to use and validated predictive models to identify beneficiaries experiencing homelessness in Medicaid administrative data

Health Serv Res. 2023 Aug;58(4):882-893. doi: 10.1111/1475-6773.14143. Epub 2023 Feb 28.

Abstract

Objective: To develop easy to use and validated predictive models to identify beneficiaries experiencing homelessness from administrative data.

Data sources: We pooled enrollment and claims data from enrollees of the California Whole Person Care (WPC) Medicaid demonstration program that coordinated the care of a subset of Medicaid beneficiaries identified as high utilizers in 26 California counties (25 WPC Pilots). We also used public directories of social service and health care facilities.

Study design: Using WPC Pilot-reported homelessness status, we trained seven supervised learning algorithms with different specifications to identify beneficiaries experiencing homelessness. The list of predictors included address- and claims-based indicators, demographics, health status, health care utilization, and county-level homelessness rate. We then assessed model performance using measures of balanced accuracy (BA), sensitivity, specificity, positive predictive value, negative predictive value, and area under the receiver operating characteristic curve (area under the curve [AUC]).

Data collection/extraction methods: We included 93,656 WPC enrollees from 2017 to 2018, 37,441 of whom had a WPC Pilot-reported homelessness indicator.

Principal findings: The random forest algorithm with all available indicators had the best performance (87% BA and 0.95 AUC), but a simpler Generalized Linear Model (GLM) also performed well (74% BA and 0.83 AUC). Reducing predictors to the top 20 and top five most important indicators in a GLM model yields only slightly lower performance (86% BA and 0.94 AUC for the top 20 and 86% BA and 0.91 AUC for the top five).

Conclusions: Large samples can be used to accurately predict homelessness in Medicaid administrative data if a validated homelessness indicator for a small subset can be obtained. In the absence of a validated indicator, the likelihood of homelessness can be calculated using county rate of homelessness, address- and claim-based indicators, and beneficiary age using a prediction model presented here. These approaches are needed given the rising prevalence of homelessness and the focus of Medicaid and other payers on addressing homelessness and its outcomes.

Keywords: Medicaid; administrative data; homelessness; machine learning algorithms.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Health Status
  • Humans
  • Ill-Housed Persons*
  • Medicaid*
  • ROC Curve
  • United States