HunCRC: annotated pathological slides to enhance deep learning applications in colorectal cancer screening

Sci Data. 2022 Jun 28;9(1):370. doi: 10.1038/s41597-022-01450-y.

Abstract

Histopathology is the gold standard method for staging and grading human tumors and provides critical information for the oncoteam's decision making. Highly-trained pathologists are needed for careful microscopic analysis of the slides produced from tissue taken from biopsy. This is a time-consuming process. A reliable decision support system would assist healthcare systems that often suffer from a shortage of pathologists. Recent advances in digital pathology allow for high-resolution digitalization of pathological slides. Digital slide scanners combined with modern computer vision models, such as convolutional neural networks, can help pathologists in their everyday work, resulting in shortened diagnosis times. In this study, 200 digital whole-slide images are published which were collected via hematoxylin-eosin stained colorectal biopsy. Alongside the whole-slide images, detailed region level annotations are also provided for ten relevant pathological classes. The 200 digital slides, after pre-processing, resulted in 101,389 patches. A single patch is a 512 × 512 pixel image, covering 248 × 248 μm2 tissue area. Versions at higher resolution are available as well. Hopefully, HunCRC, this widely accessible dataset will aid future colorectal cancer computer-aided diagnosis and research.

Publication types

  • Dataset

MeSH terms

  • Colorectal Neoplasms* / diagnosis
  • Deep Learning*
  • Diagnosis, Computer-Assisted
  • Early Detection of Cancer
  • Humans
  • Neural Networks, Computer