Classification of imbalanced oral cancer image data from high-risk population

Bofan Song; Shaobai Li; Sumsum Sunny; Keerthi Gurushanth; Pramila Mendonca; Nirza Mukhia; Sanjana Patrick; Shubha Gurudath; Subhashini Raghavan; Imchen Tsusennaro; Shirley T Leivon; Trupti Kolur; Vivek Shetty; Vidya Bushan; Rohan Ramesh; Tyler Peterson; Vijay Pillai; Petra Wilder-Smith; Alben Sigamani; Amritha Suresh; Moni Abraham Kuriakose; Praveen Birur; Rongguang Liang

doi:10.1117/1.JBO.26.10.105001

Classification of imbalanced oral cancer image data from high-risk population

J Biomed Opt. 2021 Oct;26(10):105001. doi: 10.1117/1.JBO.26.10.105001.

Authors

Bofan Song¹, Shaobai Li¹, Sumsum Sunny², Keerthi Gurushanth³, Pramila Mendonca⁴, Nirza Mukhia³, Sanjana Patrick⁵, Shubha Gurudath³, Subhashini Raghavan³, Imchen Tsusennaro⁶, Shirley T Leivon⁶, Trupti Kolur⁴, Vivek Shetty⁴, Vidya Bushan⁴, Rohan Ramesh⁶, Tyler Peterson¹, Vijay Pillai⁴, Petra Wilder-Smith⁷, Alben Sigamani⁴, Amritha Suresh^{2

4}, Moni Abraham Kuriakose⁸, Praveen Birur^{3

5}, Rongguang Liang¹

Affiliations

¹ The University of Arizona, Wyant College of Optical Sciences, Tucson, Arizona, United States.
² Mazumdar Shaw Medical Centre, Bangalore, India.
³ KLE Society Institute of Dental Sciences, Bangalore, India.
⁴ Mazumdar Shaw Medical Foundation, Bangalore, India.
⁵ Biocon Foundation, Bangalore, India.
⁶ Christian Institute of Health Sciences and Research, Dimapur, India.
⁷ University of California Beckman Laser Institute and Medical Clinic, Irvine, California, United States.
⁸ Cochin Cancer Research Center, Kochi, India.

Abstract

Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of classification.

Aim: To reduce the class bias caused by data imbalance.

Approach: We collected 3851 polarized white light cheek mucosa images using our customized oral cancer screening device. We use weight balancing, data augmentation, undersampling, focal loss, and ensemble methods to improve the neural network performance of oral cancer image classification with the imbalanced multi-class datasets captured from high-risk populations during oral cancer screening in low-resource settings.

Results: By applying both data-level and algorithm-level approaches to the deep learning training process, the performance of the minority classes, which were difficult to distinguish at the beginning, has been improved. The accuracy of "premalignancy" class is also increased, which is ideal for screening applications.

Conclusions: Experimental results show that the class bias induced by imbalanced oral cancer image datasets could be reduced using both data- and algorithm-level methods. Our study may provide an important basis for helping understand the influence of unbalanced datasets on oral cancer deep learning classifiers and how to mitigate.

Keywords: deep learning; ensemble learning; imbalanced multi-class datasets; mobile screening device; oral cancer.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Early Detection of Cancer
Humans
Machine Learning
Mouth Neoplasms* / diagnostic imaging
Neural Networks, Computer*

Abstract

Publication types

MeSH terms

Grants and funding