COVID-19 detection in radiological text reports integrating entity recognition

Comput Biol Med. 2020 Dec:127:104066. doi: 10.1016/j.compbiomed.2020.104066. Epub 2020 Oct 22.

Abstract

COVID-19 diagnosis is usually based on PCR test using radiological images, mainly chest Computed Tomography (CT) for the assessment of lung involvement by COVID-19. However, textual radiological reports also contain relevant information for determining the likelihood of presenting radiological signs of COVID-19 involving lungs. The development of COVID-19 automatic detection systems based on Natural Language Processing (NLP) techniques could provide a great help in supporting clinicians and detecting COVID-19 related disorders within radiological reports. In this paper we propose a text classification system based on the integration of different information sources. The system can be used to automatically predict whether or not a patient has radiological findings consistent with COVID-19 on the basis of radiological reports of chest CT. To carry out our experiments we use 295 radiological reports from chest CT studies provided by the ''HT médica" clinic. All of them are radiological requests with suspicions of chest involvement by COVID-19. In order to train our text classification system we apply Machine Learning approaches and Named Entity Recognition. The system takes two sources of information as input: the text of the radiological report and COVID-19 related disorders extracted from SNOMED-CT. The best system is trained using SVM and the baseline results achieve 85% accuracy predicting lung involvement by COVID-19, which already offers competitive values that are difficult to overcome. Moreover, we apply mutual information in order to integrate the best quality information extracted from SNOMED-CT. In this way, we achieve around 90% accuracy improving the baseline results by 5 points.

Keywords: COVID-19; Named entity recognition; Natural language processing; Radiological report; Text classification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Automation
  • COVID-19 / diagnosis*
  • COVID-19 / virology
  • Humans
  • Language
  • SARS-CoV-2 / isolation & purification*
  • Spain
  • Systematized Nomenclature of Medicine