Text Analysis of Radiology Reports with Signs of Intracranial Hemorrhage on Brain CT Scans Using the Decision Tree Algorithm

Sovrem Tekhnologii Med. 2022;14(6):34-40. doi: 10.17691/stm2022.14.6.04. Epub 2022 Nov 28.

Abstract

The aim of the study is to create, train, and test the algorithm for the analysis of brain CT text reports using a decision tree model to solve the task of simple binary classification of presence/absence of intracranial hemorrhage (ICH) signs.

Materials and methods: The initial data is a download from the Unified Radiological Information Service of the Unified Medical Information and Analytical System (URIS UMIAS) containing 34,188 studies obtained by a non-contrast CT of the brain in 56 inpatient medical settings. Data analysis and preprocessing were carried out using NLTK (Natural Language Toolkit, version 3.6.5), a library for symbolic and statistical processing of natural language, and scikit-learn, a machine learning library containing tools for classification tasks. According to 14 selected ICH-related key words, as well as 33 stop-phrases with key words denoting absence of ICH, an automatic selection of the CT investigations and their subsequent expert verification were carried out. Two classes of investigations were formed based on the sample from 3980 protocol descriptions: containing descriptions of ICH and without them. The problem of binary classification was solved using the decision tree algorithm as a model. To evaluate the performance of the model, the CT investigations were divided randomly into samples in the ratio of 7:3. Of 3980 protocols, 2786 were assigned to the training data set, 1194 - to the test one.

Results: According to the test results, the designed and trained algorithm in the binary classification of the CT reports "with signs of ICH" and "without signs of ICH" has shown sensitivity of 0.94, specificity of 0.88, F-score of 0.83.

Conclusion: The developed and trained algorithm for the analysis of radiology reports has demonstrated high accuracy in relation to brain CT with signs of intracranial hemorrhage and can be used to solve binary classification problems and create appropriate data sets. However, it is limited by the need for manual revision of CT studies to ensure quality control.

Keywords: computed tomography; decision tree algorithm; diagnostic reports; intracranial hemorrhage; machine learning; natural language processing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Brain / diagnostic imaging
  • Decision Trees
  • Humans
  • Intracranial Hemorrhages / diagnostic imaging
  • Natural Language Processing*
  • Radiology*
  • Tomography, X-Ray Computed / methods

Grants and funding

Study funding. The work was financially supported by the grant of the Russian Science Foundation No.22-25-20231, https://rscf.ru/project/22-25-20231/.