Selection of Clinical Text Features for Classifying Suicide Attempts

Ryan S Buckland; Joseph W Hogan; Elizabeth S Chen

Selection of Clinical Text Features for Classifying Suicide Attempts

AMIA Annu Symp Proc. 2021 Jan 25:2020:273-282. eCollection 2020.

Authors

Ryan S Buckland^{1

2}, Joseph W Hogan², Elizabeth S Chen¹

Affiliations

¹ Center for Biomedical Informatics, Brown University, Providence, RI.
² Department of Biostatistics, Brown University, Providence, RI.

PMID: 33936399
PMCID: PMC8075476

Abstract

Research has demonstrated cohort misclassification when studies of suicidal thoughts and behaviors (STBs) rely on ICD-9/10-CM diagnosis codes. Electronic health record (EHR) data are being explored to better identify patients, a process called EHR phenotyping. Most STB phenotyping studies have used structured EHR data, but some are beginning to incorporate unstructured clinical text. In this study, we used a publicly-accessible natural language processing (NLP) program for biomedical text (MetaMap) and iterative elastic net regression to extract and select predictive text features from the discharge summaries of 810 inpatient admissions of interest. Initial sets of 5,866 and 2,709 text features were reduced to 18 and 11, respectively. The two models fit with these features obtained an area under the receiver operating characteristic curve of 0.866-0.895 and an area under the precision-recall curve of 0.800-0.838, demonstrating the approach's potential to identify textual features to incorporate in phenotyping models.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms*
Cohort Studies
Data Mining / methods*
Electronic Health Records / classification*
Female
Humans
International Classification of Diseases
Machine Learning
Male
Natural Language Processing*
Phenotype
Prevalence
ROC Curve
Suicide, Attempted / classification*

Grants and funding

U54 GM115677/GM/NIGMS NIH HHS/United States