Can Machine Learning Be Better than Biased Readers?

Atsuhiro Hibi; Rui Zhu; Pascal N Tyrrell

doi:10.3390/tomography9030074

Can Machine Learning Be Better than Biased Readers?

Tomography. 2023 Apr 28;9(3):901-908. doi: 10.3390/tomography9030074.

Authors

Atsuhiro Hibi^{1

2}, Rui Zhu³, Pascal N Tyrrell^{1

2

4}

Affiliations

¹ Institute of Medical Science, University of Toronto, Toronto, ON M5S 1A8, Canada.
² Department of Medical Imaging, University of Toronto, Toronto, ON M5T 1W7, Canada.
³ Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada.
⁴ Department of Statistical Sciences, University of Toronto, Toronto, ON M5G 1Z5, Canada.

Abstract

Background: Training machine learning (ML) models in medical imaging requires large amounts of labeled data. To minimize labeling workload, it is common to divide training data among multiple readers for separate annotation without consensus and then combine the labeled data for training a ML model. This can lead to a biased training dataset and poor ML algorithm prediction performance. The purpose of this study is to determine if ML algorithms can overcome biases caused by multiple readers' labeling without consensus. Methods: This study used a publicly available chest X-ray dataset of pediatric pneumonia. As an analogy to a practical dataset without labeling consensus among multiple readers, random and systematic errors were artificially added to the dataset to generate biased data for a binary-class classification task. The Resnet18-based convolutional neural network (CNN) was used as a baseline model. A Resnet18 model with a regularization term added as a loss function was utilized to examine for improvement in the baseline model. Results: The effects of false positive labels, false negative labels, and random errors (5-25%) resulted in a loss of AUC (0-14%) when training a binary CNN classifier. The model with a regularized loss function improved the AUC (75-84%) over that of the baseline model (65-79%). Conclusion: This study indicated that it is possible for ML algorithms to overcome individual readers' biases when consensus is not available. It is recommended to use regularized loss functions when allocating annotation tasks to multiple readers as they are easy to implement and effective in mitigating biased labels.

Keywords: annotation bias; chest X-ray; convolutional neural network; labeling consensus; machine learning.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Bias
Child
Humans
Machine Learning*
Neural Networks, Computer*

Grants and funding

This work was supported by a research grant from the Nippon Steel Corporation (Fund Number 509533).