Hybrid Feature-Learning-Based PSO-PCA Feature Engineering Approach for Blood Cancer Classification

Diagnostics (Basel). 2023 Aug 14;13(16):2672. doi: 10.3390/diagnostics13162672.

Abstract

Acute lymphoblastic leukemia (ALL) is a lethal blood cancer that is characterized by an abnormal increased number of immature lymphocytes in the blood or bone marrow. For effective treatment of ALL, early assessment of the disease is essential. Manual examination of stained blood smear images is current practice for initially screening ALL. This practice is time-consuming and error-prone. In order to effectively diagnose ALL, numerous deep-learning-based computer vision systems have been developed for detecting ALL in blood peripheral images (BPIs). Such systems extract a huge number of image features and use them to perform the classification task. The extracted features may contain irrelevant or redundant features that could reduce classification accuracy and increase the running time of the classifier. Feature selection is considered an effective tool to mitigate the curse of the dimensionality problem and alleviate its corresponding shortcomings. One of the most effective dimensionality-reduction tools is principal component analysis (PCA), which maps input features into an orthogonal space and extracts the features that convey the highest variability from the data. Other feature selection approaches utilize evolutionary computation (EC) to search the feature space and localize optimal features. To profit from both feature selection approaches in improving the classification performance of ALL, in this study, a new hybrid deep-learning-based feature engineering approach is proposed. The introduced approach integrates the powerful capability of PCA and particle swarm optimization (PSO) approaches in selecting informative features from BPI mages with the power of pre-trained CNNs of feature extraction. Image features are first extracted through the feature-transfer capability of the GoogleNet convolutional neural network (CNN). PCA is utilized to generate a feature set of the principal components that covers 95% of the variability in the data. In parallel, bio-inspired particle swarm optimization is used to search for the optimal image features. The PCA and PSO-derived feature sets are then integrated to develop a hybrid set of features that are then used to train a Bayesian-based optimized support vector machine (SVM) and subspace discriminant ensemble-learning (SDEL) classifiers. The obtained results show improved classification performance for the ML classifiers trained by the proposed hybrid feature set over the original PCA, PSO, and all extracted feature sets for ALL multi-class classification. The Bayesian-optimized SVM trained with the proposed hybrid PCA-PSO feature set achieves the highest classification accuracy of 97.4%. The classification performance of the proposed feature engineering approach competes with the state of the art.

Keywords: classification; deep learning; feature selection; leukemia; machine learning; optimization.