Two-stage video-based convolutional neural networks for adult spinal deformity classification

Kaixu Chen; Tomoyuki Asada; Naoto Ienaga; Kousei Miura; Kotaro Sakashita; Takahiro Sunami; Hideki Kadone; Masashi Yamazaki; Yoshihiro Kuroda

doi:10.3389/fnins.2023.1278584

Two-stage video-based convolutional neural networks for adult spinal deformity classification

Front Neurosci. 2023 Dec 11:17:1278584. doi: 10.3389/fnins.2023.1278584. eCollection 2023.

Authors

Kaixu Chen¹, Tomoyuki Asada², Naoto Ienaga³, Kousei Miura², Kotaro Sakashita², Takahiro Sunami², Hideki Kadone^{2

3}, Masashi Yamazaki², Yoshihiro Kuroda⁴

Affiliations

¹ Degree Programs in Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan.
² Department of Orthopaedic Surgery, Institute of Medicine, University of Tsukuba, Tsukuba, Japan.
³ Center for Cybernics Research, University of Tsukuba, Tsukuba, Japan.
⁴ Division of Intelligent Interaction Technologies, Institute of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan.

Abstract

Introduction: Assessment of human gait posture can be clinically effective in diagnosing human gait deformities early in life. Currently, two methods-static and dynamic-are used to diagnose adult spinal deformity (ASD) and other spinal disorders. Full-spine lateral standing radiographs are used in the standard static method. However, this is a static assessment of joints in the standing position and does not include information on joint changes when the patient walks. Careful observation of long-distance walking can provide a dynamic assessment that reveals an uncompensated posture; however, this increases the workload of medical practitioners. A three-dimensional (3D) motion system is proposed for the dynamic method. Although the motion system successfully detected dynamic posture changes, access to the facilities was limited. Therefore, a diagnostic approach that is facility-independent, has low practice flow, and does not involve patient contact is required.

Methods: We focused on a video-based method to classify patients with spinal disorders either as ASD, or other forms of ASD. To achieve this goal, we present a video-based two-stage machine-learning method. In the first stage, deep learning methods are used to locate the patient and extract the area where the patient is located. In the second stage, a 3D CNN (convolutional neural network) device is used to capture spatial and temporal information (dynamic motion) from the extracted frames. Disease classification is performed by discerning posture and gait from the extracted frames. Model performance was assessed using the mean accuracy, F1 score, and area under the receiver operating characteristic curve (AUROC), with five-fold cross-validation. We also compared the final results with professional observations.

Results: Our experiments were conducted using a gait video dataset comprising 81 patients. The experimental results indicated that our method is effective for classifying ASD and other spinal disorders. The proposed method achieved a mean accuracy of 0.7553, an F1 score of 0.7063, and an AUROC score of 0.7864. Additionally, ablation experiments indicated the importance of the first stage (detection stage) and transfer learning of our proposed method.

Discussion: The observations from the two doctors were compared using the proposed method. The mean accuracies observed by the two doctors were 0.4815 and 0.5247, with AUROC scores of 0.5185 and 0.5463, respectively. We proved that the proposed method can achieve accurate and reliable medical testing results compared with doctors' observations using videos of 1 s duration. All our code, models, and results are available at https://github.com/ChenKaiXuSan/Walk_Video_PyTorch. The proposed framework provides a potential video-based method for improving the clinical diagnosis for ASD and non-ASD. This framework might, in turn, benefit both patients and clinicians to treat the disease quickly and directly and further reduce facility dependency and data-driven systems.

Keywords: 3D CNN; adult spinal deformity; human action recognition; spinal disorder; video-based method.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was partially supported by the Uehara Memorial Foundation and JST. JST has established university fellowships to create science and technology innovations (Grant Number JPMJFS2105). This work was also partly supported by AMED under Grant Number JP23YM0126803.