Temporal Micro-Action Localization for Videofluoroscopic Swallowing Study

IEEE J Biomed Health Inform. 2023 Dec;27(12):5904-5913. doi: 10.1109/JBHI.2023.3313255. Epub 2023 Dec 5.

Abstract

Videofluoroscopic swallowing study (VFSS) visualizes the swallowing movement by using X-ray fluoroscopy, which is the most widely used method for dysphagia examination. To better facilitate swallowing assessment, the temporal parameter is one of the most important indicators. However, most information of that acquire is hand-crafted and elaborated, which is time-consuming and difficult to ensure objectivity and accuracy. In this article, we propose to formulate this task as a temporal action localization task and solve it using deep neural networks. However, the action of VFSS has the following characteristics such as small motion targets, small action amplitudes, large sample variances, short duration, and variations in duration. Furthermore, all existing methods often rely on daily behaviors, which makes locating and recognizing micro-actions more challenging. To address the above issues, we first collect and annotate the VFSS micro-action dataset, which includes 847 VFSS data from 71 subjects, due to the lack of benchmarks. We then introduce a coarse-to-fine mechanism to handle the short and repeated nature of micro-actions, which can significantly enhancing micro-action localization accuracy. Moreover, we propose a Variable-Size Window Generator method, which improves the model's characterization performance and addresses the issue of different action timings, leading to further improvements in localization accuracy. The results of our experiments demonstrate the superiority of our method, with significantly improved performance (46.10% vs. 37.70%).

MeSH terms

  • Deglutition Disorders* / diagnostic imaging
  • Deglutition*
  • Fluoroscopy / methods
  • Humans
  • Neural Networks, Computer
  • Time Factors