Protocol for the automatic extraction of epidemiological information via a pre-trained language model

STAR Protoc. 2023 Sep 15;4(3):102392. doi: 10.1016/j.xpro.2023.102392. Epub 2023 Jul 1.

Abstract

The lack of systems to automatically extract epidemiological fields from open-access COVID-19 cases restricts the timeliness of formulating prevention measures. Here we present a protocol for using CCIE, a COVID-19 Cases Information Extraction system based on the pre-trained language model.1 We describe steps for preparing supervised training data and executing python scripts for named entity recognition and text category classification. We then detail the use of machine evaluation and manual validation to illustrate the effectiveness of CCIE. For complete details on the use and execution of this protocol, please refer to Wang et al.2.

Keywords: Clinical Protocol; Computer Sciences; Health Sciences.

MeSH terms

  • COVID-19* / epidemiology
  • Humans
  • Language
  • Natural Language Processing*