Homogeneous Adaboost Ensemble Machine Learning Algorithms with Reduced Entropy on Balanced Data

Entropy (Basel). 2023 Jan 29;25(2):245. doi: 10.3390/e25020245.

Abstract

Today's world faces a serious public health problem with cancer. One type of cancer that begins in the breast and spreads to other body areas is breast cancer (BC). Breast cancer is one of the most prevalent cancers that claim the lives of women. It is also becoming clearer that most cases of breast cancer are already advanced when they are brought to the doctor's attention by the patient. The patient may have the evident lesion removed, but the seeds have reached an advanced stage of development or the body's ability to resist them has weakened considerably, rendering them ineffective. Although it is still much more common in more developed nations, it is also quickly spreading to less developed countries. The motivation behind this study is to use an ensemble method for the prediction of BC, as an ensemble model aims to automatically manage the strengths and weaknesses of each of its separate models, resulting in the best decision being made overall. The main objective of this paper is to predict and classify breast cancer using Adaboost ensemble techniques. The weighted entropy is computed for the target column. Taking each attribute's weights results in the weighted entropy. Each class's likelihood is represented by the weights. The amount of information gained increases with a decrease in entropy. Both individual and homogeneous ensemble classifiers, created by mixing Adaboost with different single classifiers, have been used in this work. In order to deal with the class imbalance issue as well as noise, the synthetic minority over-sampling technique (SMOTE) was used as part of the data mining pre-processing. The suggested approach uses a decision tree (DT) and naive Bayes (NB), with Adaboost ensemble techniques. The experimental findings shown 97.95% accuracy for prediction using the Adaboost-random forest classifier.

Keywords: breast cancer; ensemble methods; entropy; machine learning; precision.

Grants and funding

The National Research Foundation of Ukraine funded this research under project number 2021.01/0103.