Prediction of venous thromboembolism using semantic and sentiment analyses of clinical narratives

Comput Biol Med. 2018 Mar 1:94:1-10. doi: 10.1016/j.compbiomed.2017.12.026. Epub 2018 Jan 3.

Abstract

Venous thromboembolism (VTE) is the third most common cardiovascular disorder. It affects people of both genders at ages as young as 20 years. The increased number of VTE cases with a high fatality rate of 25% at first occurrence makes preventive measures essential. Clinical narratives are a rich source of knowledge and should be included in the diagnosis and treatment processes, as they may contain critical information on risk factors. It is very important to make such narrative blocks of information usable for searching, health analytics, and decision-making. This paper proposes a Semantic Extraction and Sentiment Assessment of Risk Factors (SESARF) framework. Unlike traditional machine-learning approaches, SESARF, which consists of two main algorithms, namely, ExtractRiskFactor and FindSeverity, prepares a feature vector as the input to a support vector machine (SVM) classifier to make a diagnosis. SESARF matches and maps the concepts of VTE risk factors and finds adjectives and adverbs that reflect their levels of severity. SESARF uses a semantic- and sentiment-based approach to analyze clinical narratives of electronic health records (EHR) and then predict a diagnosis of VTE. We use a dataset of 150 clinical narratives, 80% of which are used to train our prediction classifier support vector machine, with the remaining 20% used for testing. Semantic extraction and sentiment analysis results yielded precisions of 81% and 70%, respectively. Using a support vector machine, prediction of patients with VTE yielded precision and recall values of 54.5% and 85.7%, respectively.

Keywords: Natural language processing; Prediction through classification; Risk factor assessment; Semantic enrichment; Sentiment analysis; Support vector machine; Venous thromboembolism.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Diagnosis, Computer-Assisted / methods*
  • Electronic Health Records*
  • Female
  • Humans
  • Male
  • Predictive Value of Tests
  • Semantic Web*
  • Support Vector Machine*
  • Venous Thromboembolism / diagnosis*