Study on Data Partition for Delimitation of Masses in Mammography

J Imaging. 2021 Sep 2;7(9):174. doi: 10.3390/jimaging7090174.

Abstract

Mammography is the primary medical imaging method used for routine screening and early detection of breast cancer in women. However, the process of manually inspecting, detecting, and delimiting the tumoral massess in 2D images is a very time-consuming task, subject to human errors due to fatigue. Therefore, integrated computer-aided detection systems have been proposed, based on modern computer vision and machine learning methods. In the present work, mammogram images from the publicly available Inbreast dataset are first converted to pseudo-color and then used to train and test a Mask R-CNN deep neural network. The most common approach is to start with a dataset and split the images into train and test set randomly. However, since there are often two or more images of the same case in the dataset, the way the dataset is split may have an impact on the results. Our experiments show that random partition of the data can produce unreliable training, so the dataset must be split using case-wise partition for more stable results. In experimental results, the method achieves an average true positive rate of 0.936 with 0.063 standard deviation using random partition and 0.908 with 0.002 standard deviation using case-wise partition, showing that case-wise partition must be used for more reliable results.

Keywords: Mask R-CNN; breast mass; computer-aided detection; dataset partition; mammography; mass detection; mass segmentation.