A Dataset for Breast Cancer Histopathological Image Classification

IEEE Trans Biomed Eng. 2016 Jul;63(7):1455-62. doi: 10.1109/TBME.2015.2496264. Epub 2015 Oct 30.

Abstract

Today, medical image analysis papers require solid experiments to prove the usefulness of proposed methods. However, experiments are often performed on data selected by the researchers, which may come from different institutions, scanners, and populations. Different evaluation measures may be used, making it difficult to compare the methods. In this paper, we introduce a dataset of 7909 breast cancer histopathology images acquired on 82 patients, which is now publicly available from http://web.inf.ufpr.br/vri/breast-cancer-database. The dataset includes both benign and malignant images. The task associated with this dataset is the automated classification of these images in two classes, which would be a valuable computer-aided diagnosis tool for the clinician. In order to assess the difficulty of this task, we show some preliminary results obtained with state-of-the-art image classification systems. The accuracy ranges from 80% to 85%, showing room for improvement is left. By providing this dataset and a standardized evaluation protocol to the scientific community, we hope to gather researchers in both the medical and the machine learning field to advance toward this clinical application.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast / diagnostic imaging*
  • Breast Neoplasms / diagnostic imaging*
  • Databases, Factual*
  • Female
  • Histocytochemistry
  • Humans
  • Image Interpretation, Computer-Assisted / methods*
  • Microscopy