Clinical Application of Detecting COVID-19 Risks: A Natural Language Processing Approach

Viruses. 2022 Dec 11;14(12):2761. doi: 10.3390/v14122761.

Abstract

The clinical application of detecting COVID-19 factors is a challenging task. The existing named entity recognition models are usually trained on a limited set of named entities. Besides clinical, the non-clinical factors, such as social determinant of health (SDoH), are also important to study the infectious disease. In this paper, we propose a generalizable machine learning approach that improves on previous efforts by recognizing a large number of clinical risk factors and SDoH. The novelty of the proposed method lies in the subtle combination of a number of deep neural networks, including the BiLSTM-CNN-CRF method and a transformer-based embedding layer. Experimental results on a cohort of COVID-19 data prepared from PubMed articles show the superiority of the proposed approach. When compared to other methods, the proposed approach achieves a performance gain of about 1-5% in terms of macro- and micro-average F1 scores. Clinical practitioners and researchers can use this approach to obtain accurate information regarding clinical risks and SDoH factors, and use this pipeline as a tool to end the pandemic or to prepare for future pandemics.

Keywords: COVID-19; clinical; de-identification; named entities; non-clinical; pipeline; social determinants of health.

MeSH terms

  • COVID-19* / diagnosis
  • Electronic Health Records
  • Humans
  • Machine Learning
  • Natural Language Processing*
  • Neural Networks, Computer

Grants and funding

This research received no external funding.