Analyzing the symptoms in colorectal and breast cancer patients with or without type 2 diabetes using EHR data

Health Informatics J. 2021 Jan-Mar;27(1):14604582211000785. doi: 10.1177/14604582211000785.

Abstract

This research extracted patient-reported symptoms from free-text EHR notes of colorectal and breast cancer patients and studied the correlation of the symptoms with comorbid type 2 diabetes, race, and smoking status. An NLP framework was developed first to use UMLS MetaMap to extract all symptom terms from the 366,398 EHR clinical notes of 1694 colorectal cancer (CRC) patients and 3458 breast cancer (BC) patients. Semantic analysis and clustering algorithms were then developed to categorize all the relevant symptoms into eight symptom clusters defined by seed terms. After all the relevant symptoms were extracted from the EHR clinical notes, the frequency of the symptoms reported from colorectal cancer (CRC) and breast cancer (BC) patients over three time-periods post-chemotherapy was calculated. Logistic regression (LR) was performed with each symptom cluster as the response variable while controlling for diabetes, race, and smoking status. The results show that the CRC and BC patients with Type 2 Diabetes (T2D) were more likely to report symptoms than CRC and BC without T2D over three time-periods in the cancer trajectory. We also found that current smokers were more likely to report anxiety (CRC, BC), neuropathic symptoms (CRC, BC), anxiety (BC), and depression (BC) than non-smokers.

Keywords: clinical decision-making; data mining; electronic health records; machine learning; text mining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Breast Neoplasms* / complications
  • Breast Neoplasms* / therapy
  • Cluster Analysis
  • Colorectal Neoplasms* / complications
  • Diabetes Mellitus, Type 2* / complications
  • Female
  • Humans