Text Extraction and Standardization System Development for Pathological Records in the Korea Biobank Network

Stud Health Technol Inform. 2024 Jan 25:310:1440-1441. doi: 10.3233/SHTI231234.

Abstract

In Korea, the Korea Centers for Disease Control and Prevention operates the Korea BioBank Network (KBN). KBN has pathological records that collected in Korea and it is useful dataset for research. In this study, we established system that time efficient and reduced error by step-by-step data extraction process from KBN pathological records. We tested the extraction process by 769 lung cancer cohorts and 1292 breast cancer cohorts and accuracy is 91%. We expect this system can be used to efficiently process data from multiple institutions, including Korea BioBank Network.

Keywords: NLP; biobank system.

MeSH terms

  • Biological Specimen Banks*
  • Centers for Disease Control and Prevention, U.S.
  • Humans
  • Lung Neoplasms*
  • Republic of Korea
  • United States