Development of comprehensive annotation criteria for patients' states from clinical texts

J Biomed Inform. 2022 Oct:134:104200. doi: 10.1016/j.jbi.2022.104200. Epub 2022 Sep 9.

Abstract

In clinical records, much of the clinical information is recorded as free text, thus necessitating the use of advanced automatic information extraction technology. The development of practical technologies requires a corpus with finer granularity annotations that describe the information in the corpus, but such annotation criteria have not been researched enough thus far. This study aimed to develop fine grained annotation criteria that exhaustively cover patients' states in case reports. We collected 362 case reports-written in Japanese-of intractable diseases that were expected to contain a broad range of patients' states. Criteria were developed by repeatedly revising and annotating the clinical case report text. A set of annotation criteria for patients' states, consisting of 46 entity types, 9 attributes, and 36 relations, was obtained it allows more detailed information to be expressed than in previous studies by broader range of concept types including treatment, and captures clinical information based on a combination of causality and judgment, which could not be expressed before.

Keywords: Datasets; Medical records; Natural language processing.

Publication types

  • Case Reports
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Information Storage and Retrieval*
  • Natural Language Processing*