Brain Cancer Prediction Based on Novel Interpretable Ensemble Gene Selection Algorithm and Classifier

Abdulqader M Almars; Majed Alwateer; Mohammed Qaraad; Souad Amjad; Hanaa Fathi; Ayda K Kelany; Nazar K Hussein; Mostafa Elhosseini

doi:10.3390/diagnostics11101936

Brain Cancer Prediction Based on Novel Interpretable Ensemble Gene Selection Algorithm and Classifier

Diagnostics (Basel). 2021 Oct 19;11(10):1936. doi: 10.3390/diagnostics11101936.

Authors

Abdulqader M Almars¹, Majed Alwateer¹, Mohammed Qaraad^{2

3}, Souad Amjad², Hanaa Fathi⁴, Ayda K Kelany⁵, Nazar K Hussein⁶, Mostafa Elhosseini^{1

7}

Affiliations

¹ College of Computer Science and Engineering, Taibah University, Yanbu 46411, Saudi Arabia.
² Department of Computer Science, Faculty of Science, Abdelmalek Essaadi University, Tetouan 93000, Morocco.
³ Math and Computer Science Department, Amran University, Amran 891-6162, Yemen.
⁴ Math and Computer Science Department, Menoufia University, Menoufia 32511, Egypt.
⁵ Department of Genomic Medicine, Faculty of Science, Cairo University, Cairo 12613, Egypt.
⁶ Department of Mathematics, College of Computer Sciences and Mathematics, Tikrit University, Tikrit 34001, Iraq.
⁷ Computers Engineering and Control Systems Department, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt.

Abstract

The growth of abnormal cells in the brain causes human brain tumors. Identifying the type of tumor is crucial for the prognosis and treatment of the patient. Data from cancer microarrays typically include fewer samples with many gene expression levels as features, reflecting the curse of dimensionality and making classifying data from microarrays challenging. In most of the examined studies, cancer classification (Malignant and benign) accuracy was examined without disclosing biological information related to the classification process. A new approach was proposed to bridge the gap between cancer classification and the interpretation of the biological studies of the genes implicated in cancer. This study aims to develop a new hybrid model for cancer classification (by using feature selection mRMRe as a key step to improve the performance of classification methods and a distributed hyperparameter optimization for gradient boosting ensemble methods). To evaluate the proposed method, NB, RF, and SVM classifiers have been chosen. In terms of the AUC, sensitivity, and specificity, the optimized CatBoost classifier performed better than the optimized XGBoost in cross-validation 5, 6, 8, and 10. With an accuracy of 0.91±0.12, the optimized CatBoost classifier is more accurate than the CatBoost classifier without optimization, which is 0.81± 0.24. By using hybrid algorithms, SVM, RF, and NB automatically become more accurate. Furthermore, in terms of accuracy, SVM and RF (0.97±0.08) achieve equivalent and higher classification accuracy than NB (0.91±0.12). The findings of relevant biomedical studies confirm the findings of the selected genes.

Keywords: brain cancer; classification; ensemble methods; feature selection; gene expression data; gene selection; hyperparameter optimization.