Using natural language processing to analyze unstructured patient-reported outcomes data derived from electronic health records for cancer populations: a systematic review

Jin-Ah Sim; Xiaolei Huang; Madeline R Horan; Justin N Baker; I-Chan Huang

doi:10.1080/14737167.2024.2322664

Using natural language processing to analyze unstructured patient-reported outcomes data derived from electronic health records for cancer populations: a systematic review

Expert Rev Pharmacoecon Outcomes Res. 2024 Apr;24(4):467-475. doi: 10.1080/14737167.2024.2322664. Epub 2024 Mar 5.

Authors

Jin-Ah Sim^{1

2}, Xiaolei Huang³, Madeline R Horan¹, Justin N Baker⁴, I-Chan Huang¹

Affiliations

¹ Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, USA.
² Department of AI Convergence, Hallym University, Chuncheon, Republic of Korea.
³ Department of Computer Science, University of Memphis, Memphis, TN, USA.
⁴ Department of Pediatrics, Stanford University, Stanford, CA, USA.

PMID: 38383308
PMCID: PMC11001514 (available on 2025-04-01)
DOI: 10.1080/14737167.2024.2322664

Abstract

Introduction: Patient-reported outcomes (PROs; symptoms, functional status, quality-of-life) expressed in the 'free-text' or 'unstructured' format within clinical notes from electronic health records (EHRs) offer valuable insights beyond biological and clinical data for medical decision-making. However, a comprehensive assessment of utilizing natural language processing (NLP) coupled with machine learning (ML) methods to analyze unstructured PROs and their clinical implementation for individuals affected by cancer remains lacking.

Areas covered: This study aimed to systematically review published studies that used NLP techniques to extract and analyze PROs in clinical narratives from EHRs for cancer populations. We examined the types of NLP (with and without ML) techniques and platforms for data processing, analysis, and clinical applications.

Expert opinion: Utilizing NLP methods offers a valuable approach for processing and analyzing unstructured PROs among cancer patients and survivors. These techniques encompass a broad range of applications, such as extracting or recognizing PROs, categorizing, characterizing, or grouping PROs, predicting or stratifying risk for unfavorable clinical results, and evaluating connections between PROs and adverse clinical outcomes. The employment of NLP techniques is advantageous in converting substantial volumes of unstructured PRO data within EHRs into practical clinical utilities for individuals with cancer.

Keywords: Cancer; Electronic health records; Patient-reported outcomes; machine learning; natural language processing.

Publication types

Systematic Review

MeSH terms

Clinical Decision-Making
Electronic Health Records
Humans
Machine Learning
Natural Language Processing*
Neoplasms*

Grants and funding

R01 CA238368/CA/NCI NIH HHS/United States