Modeling healthcare data using multiple-channel latent Dirichlet allocation

J Biomed Inform. 2016 Apr:60:210-23. doi: 10.1016/j.jbi.2016.02.003. Epub 2016 Feb 16.

Abstract

Information and communications technologies have enabled healthcare institutions to accumulate large amounts of healthcare data that include diagnoses, medications, and additional contextual information such as patient demographics. To gain a better understanding of big healthcare data and to develop better data-driven clinical decision support systems, we propose a novel multiple-channel latent Dirichlet allocation (MCLDA) approach for modeling diagnoses, medications, and contextual information in healthcare data. The proposed MCLDA model assumes that a latent health status group structure is responsible for the observed co-occurrences among diagnoses, medications, and contextual information. Using a real-world research testbed that includes one million healthcare insurance claim records, we investigate the utility of MCLDA. Our empirical evaluation results suggest that MCLDA is capable of capturing the comorbidity structures and linking them with the distribution of medications. Moreover, MCLDA is able to identify the pairing between diagnoses and medications in a record based on the assigned latent groups. MCLDA can also be employed to predict missing medications or diagnoses given partial records. Our evaluation results also show that, in most cases, MCLDA outperforms alternative methods such as logistic regressions and the k-nearest-neighbor (KNN) model for two prediction tasks, i.e., medication and diagnosis prediction. Thus, MCLDA represents a promising approach to modeling healthcare data for clinical decision support.

Keywords: Diagnosis prediction; Diagnosis–medication associations; Health informatics; Healthcare data mining; Medication prediction; Multiple-channel latent Dirichlet allocation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Comorbidity
  • Data Mining
  • Decision Making
  • Decision Support Systems, Clinical*
  • Humans
  • Insurance, Health / statistics & numerical data*
  • Medical Informatics / methods*
  • Models, Theoretical
  • Prescriptions
  • Software