Selection of Clinical Text Features for Classifying Suicide Attempts

AMIA Annu Symp Proc. 2021 Jan 25:2020:273-282. eCollection 2020.

Abstract

Research has demonstrated cohort misclassification when studies of suicidal thoughts and behaviors (STBs) rely on ICD-9/10-CM diagnosis codes. Electronic health record (EHR) data are being explored to better identify patients, a process called EHR phenotyping. Most STB phenotyping studies have used structured EHR data, but some are beginning to incorporate unstructured clinical text. In this study, we used a publicly-accessible natural language processing (NLP) program for biomedical text (MetaMap) and iterative elastic net regression to extract and select predictive text features from the discharge summaries of 810 inpatient admissions of interest. Initial sets of 5,866 and 2,709 text features were reduced to 18 and 11, respectively. The two models fit with these features obtained an area under the receiver operating characteristic curve of 0.866-0.895 and an area under the precision-recall curve of 0.800-0.838, demonstrating the approach's potential to identify textual features to incorporate in phenotyping models.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Cohort Studies
  • Data Mining / methods*
  • Electronic Health Records / classification*
  • Female
  • Humans
  • International Classification of Diseases
  • Machine Learning
  • Male
  • Natural Language Processing*
  • Phenotype
  • Prevalence
  • ROC Curve
  • Suicide, Attempted / classification*