Genetic hyperparameter optimization with Modified Scalable-Neighbourhood Component Analysis for breast cancer prognostication

Neural Netw. 2023 May:162:240-257. doi: 10.1016/j.neunet.2023.02.035. Epub 2023 Feb 27.

Abstract

Breast cancer is common among women resulting in mortality when left untreated. Early detection is vital so that suitable treatment could assist cancer from spreading further and save people's life. The traditional way of detection is a time-consuming process. With the evolvement of DM (Data Mining), the healthcare industry could be benefitted in predicting the disease as it permits the physicians to determine the significant attributes for diagnosis. Though, conventional techniques have used DM-based methods to identify breast cancer, they lacked in terms of prediction rate. Moreover, parametric-Softmax classifiers have been a general option by conventional works with fixed classes, particularly when huge labelled data are present during training. Nevertheless, this turns into an issue for open set cases where new classes are encountered along with few instances to learn a generalized parametric classifier. Thus, the present study aims to implement a non-parametric strategy by optimizing the embedding of a feature rather than parametric classifiers. This research utilizes Deep CNN (Deep Convolutional Neural Network) and Inception V3 for learning visual features which preserve neighbourhood outline in semantic space relying on NCA (Neighbourhood Component Analysis) criteria. Delimited by its bottleneck, the study proposes MS-NCA (Modified Scalable-Neighbourhood Component Analysis) that relies on a non-linear objective function to perform feature fusion by optimizing the distance-learning objective due to which it gains the capability of computing inner feature products without performing mapping which increases the scalability of MS-NCA. Finally, G-HPO (Genetic-Hyper-parameter Optimization) is proposed. In this case, the new stage in the algorithm simply denotes the enhancement in the length of chromosome bringing several hyperparameters into subsequent XGBoost, NB and RF models having numerous layers for identifying the normal and affected cases of breast cancer for which optimized hyper-parameter values of RF (Random Forest), NB (Naïve Bayes), and XGBoost (eXtreme Gradient Boosting) are determined. This process helps in improvising the classification rate which is confirmed through analytical results.

Keywords: Breast cancer; Data Mining; Genetic-Hyper-Parameter Optimization; Modified Scalable-Neighbourhood Component Analysis.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Breast Neoplasms* / diagnosis
  • Breast Neoplasms* / genetics
  • Female
  • Humans
  • Neural Networks, Computer
  • Random Forest