A marker-based neural network system for extracting social determinants of health

J Am Med Inform Assoc. 2023 Jul 19;30(8):1398-1407. doi: 10.1093/jamia/ocad041.

Abstract

Objective: The impact of social determinants of health (SDoH) on patients' healthcare quality and the disparity is well known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured in free-text clinical notes, but there are limited methods for automatically extracting them. We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to automatically extract SDoH information from clinical notes.

Materials and methods: The study uses the N2C2 Shared Task data, which were collected from 2 sources of clinical notes: MIMIC-III and University of Washington Harborview Medical Centers. It contains 4480 social history sections with full annotation for 12 SDoHs. In order to handle the issue of overlapping entities, we developed a novel marker-based NER model. We used it in a multi-stage pipeline to extract SDoH information from clinical notes.

Results: Our marker-based system outperformed the state-of-the-art span-based models at handling overlapping entities based on the overall Micro-F1 score performance. It also achieved state-of-the-art performance compared with the shared task methods. Our approach achieved an F1 of 0.9101, 0.8053, and 0.9025 for Subtasks A, B, and C, respectively.

Conclusions: The major finding of this study is that the multi-stage pipeline effectively extracts SDoH information from clinical notes. This approach can improve the understanding and tracking of SDoHs in clinical settings. However, error propagation may be an issue and further research is needed to improve the extraction of entities with complex semantic meanings and low-frequency entities. We have made the source code available at https://github.com/Zephyr1022/SDOH-N2C2-UTSA.

Keywords: machine learning; natural language processing; neural networks; social determinants of health; NLP; information extraction.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Electronic Health Records
  • Hospitals
  • Humans
  • Natural Language Processing
  • Neural Networks, Computer*
  • Social Determinants of Health*