A fusion of VGG-16 and ViT models for improving bone tumor classification in computed tomography

Weimin Chen; Muhammad Ayoub; Mengyun Liao; Ruizheng Shi; Mu Zhang; Feng Su; Zhiguo Huang; Yuanzhe Li; Yi Wang; Kevin K L Wong

doi:10.1016/j.jbo.2023.100508

A fusion of VGG-16 and ViT models for improving bone tumor classification in computed tomography

J Bone Oncol. 2023 Nov 2:43:100508. doi: 10.1016/j.jbo.2023.100508. eCollection 2023 Dec.

Authors

Weimin Chen¹, Muhammad Ayoub², Mengyun Liao², Ruizheng Shi³, Mu Zhang⁴, Feng Su⁴, Zhiguo Huang⁴, Yuanzhe Li⁵, Yi Wang⁵, Kevin K L Wong^{1

6}

Affiliations

¹ School of Information and Electronics, Hunan City University, Yiyang 413000, China.
² School of Computer Science and Engineering, Central South University, Changsha 410083, Hunan, China.
³ National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China.
⁴ Department of Emergency, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China.
⁵ Department of CT/MRI, The Second Affiliated Hospital of Fujian Medical University, Quanzhou 362000, China.
⁶ Department of Mechanical Engineering, College of Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.

Abstract

Background and objective: Bone tumors present significant challenges in orthopedic medicine due to variations in clinical treatment approaches for different tumor types, which includes benign, malignant, and intermediate cases. Convolutional Neural Networks (CNNs) have emerged as prominent models for tumor classification. However, their limited perception ability hinders the acquisition of global structural information, potentially affecting classification accuracy. To address this limitation, we propose an optimized deep learning algorithm for precise classification of diverse bone tumors.

Materials and methods: Our dataset comprises 786 computed tomography (CT) images of bone tumors, featuring sections from two distinct bone species, namely the tibia and femur. Sourced from The Second Affiliated Hospital of Fujian Medical University, the dataset was meticulously preprocessed with noise reduction techniques. We introduce a novel fusion model, VGG16-ViT, leveraging the advantages of the VGG-16 network and the Vision Transformer (ViT) model. Specifically, we select 27 features from the third layer of VGG-16 and input them into the Vision Transformer encoder for comprehensive training. Furthermore, we evaluate the impact of secondary migration using CT images from Xiangya Hospital for validation.

Results: The proposed fusion model demonstrates notable improvements in classification performance. It effectively reduces the training time while achieving an impressive classification accuracy rate of 97.6%, marking a significant enhancement of 8% in sensitivity and specificity optimization. Furthermore, the investigation into secondary migration's effects on experimental outcomes across the three models reveals its potential to enhance system performance.

Conclusion: Our novel VGG-16 and Vision Transformer joint network exhibits robust classification performance on bone tumor datasets. The integration of these models enables precise and efficient classification, accommodating the diverse characteristics of different bone tumor types. This advancement holds great significance for the early detection and prognosis of bone tumor patients in the future.

Keywords: Bone tumors diagnosis; Deep learning; Orthopedics image classification; VGG-16; ViT; Vision Transformer.