Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach

Cancers (Basel). 2023 Sep 4;15(17):4412. doi: 10.3390/cancers15174412.

Abstract

Objective: Cervical cancer ranks among the top causes of death among females in developing countries. The most important procedures that should be followed to guarantee the minimizing of cervical cancer's aftereffects are early identification and treatment under the finest medical guidance. One of the best methods to find this sort of malignancy is by looking at a Pap smear image. For automated detection of cervical cancer, the available datasets often have missing values, which can significantly affect the performance of machine learning models. Methods: To address these challenges, this study proposes an automated system for predicting cervical cancer that efficiently handles missing values with SMOTE features to achieve high accuracy. The proposed system employs a stacked ensemble voting classifier model that combines three machine learning models, along with KNN Imputer and SMOTE up-sampled features for handling missing values. Results: The proposed model achieves 99.99% accuracy, 99.99% precision, 99.99% recall, and 99.99% F1 score when using KNN imputed SMOTE features. The study compares the performance of the proposed model with multiple other machine learning algorithms under four scenarios: with missing values removed, with KNN imputation, with SMOTE features, and with KNN imputed SMOTE features. The study validates the efficacy of the proposed model against existing state-of-the-art approaches. Conclusions: This study investigates the issue of missing values and class imbalance in the data collected for cervical cancer detection and might aid medical practitioners in timely detection and providing cervical cancer patients with better care.

Keywords: KNN imputer; SMOTE; cervical cancer detection; ensemble learning; healthcare; missing values.

Grants and funding

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number RI-44-0050.