Patient clustering with uncoded text in electronic medical records

Ricardo Henao; Jared Murray; Geoffrey Ginsburg; Lawrence Carin; Joseph E Lucas

Patient clustering with uncoded text in electronic medical records

AMIA Annu Symp Proc. 2013 Nov 16:2013:592-9. eCollection 2013.

Authors

Ricardo Henao¹, Jared Murray¹, Geoffrey Ginsburg¹, Lawrence Carin¹, Joseph E Lucas²

Affiliations

¹ Duke University, Durham, NC.
² Quintiles, Durham, NC.

PMID: 24551361
PMCID: PMC3900202

Abstract

We propose a mixture model for text data designed to capture underlying structure in the history of present illness section of electronic medical records data. Additionally, we propose a method to induce bias that leads to more homogeneous sets of diagnoses for patients in each cluster. We apply our model to a collection of electronic records from an emergency department and compare our results to three other relevant models in order to assess performance. Results using standard metrics demonstrate that patient clusters from our model are more homogeneous when compared to others, and qualitative analyses suggest that our approach leads to interpretable patient sub-populations when applied to real data. Finally, we demonstrate an example of our patient clustering model to identify adverse drug events.

Publication types

Comparative Study

MeSH terms

Algorithms
Cluster Analysis
Drug-Related Side Effects and Adverse Reactions
Electronic Health Records*
Emergency Service, Hospital / organization & administration*
Humans
Models, Statistical*
Natural Language Processing*
Pharmacovigilance