Using text mining to extract depressive symptoms and to validate the diagnosis of major depressive disorder from electronic health records

J Affect Disord. 2020 Jan 1:260:617-623. doi: 10.1016/j.jad.2019.09.044. Epub 2019 Sep 11.

Abstract

Background: Many studies have used Taiwan's National Health Insurance Research database (NHIRD) to conduct psychiatric research. However, the accuracy of the diagnostic codes for psychiatric disorders in NHIRD is not validated, and the symptom profiles are not available either. This study aimed to evaluate the accuracy of diagnostic codes and use text mining to extract symptom profile and functional impairment from electronic health records (EHRs) to overcome the above research limitations.

Methods: A total of 500 discharge notes were randomly selected from a medical center's database. Three annotators reviewed the notes to establish gold standards. The accuracy of diagnostic codes for major psychiatric illness was evaluated. Text mining approaches were applied to extract depressive symptoms and function profiles and to identify patients with major depressive disorder.

Results: The accuracy of the diagnostic code for major depressive disorder, schizophrenia, and dementia was acceptable but that of bipolar disorder and minor depression was less satisfactory. The performance of text mining approach to recognize depressive symptoms is satisfactory; however, the recall for functional impairment is lower resulting in lower F-scores of 0.774-0.753. Using the text mining approach to identify major depressive disorder, the recall was 0.85 but precision was only 0.69.

Conclusions: The accuracy of the diagnostic code for major depressive disorder in discharge notes was generally acceptable. This finding supports the utilization of psychiatric diagnoses in claims databases. The application of text mining to EHRs might help in overcoming current limitations in research using claims databases.

Keywords: Information extraction; Major depressive disorder; Text mining.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Adult
  • Bipolar Disorder / diagnosis
  • Data Mining / methods*
  • Databases, Factual
  • Depressive Disorder, Major / diagnosis*
  • Diagnosis-Related Groups
  • Electronic Health Records / standards*
  • Female
  • Humans
  • International Classification of Diseases / standards*
  • Male
  • Schizophrenia / diagnosis
  • Taiwan