Reliability of Machine and Human Examiners for Detection of Laryngeal Penetration or Aspiration in Videofluoroscopic Swallowing Studies

Yuna Kim; Hyun-Il Kim; Geun Seok Park; Seo Young Kim; Sang-Il Choi; Seong Jae Lee

doi:10.3390/jcm10122681

Reliability of Machine and Human Examiners for Detection of Laryngeal Penetration or Aspiration in Videofluoroscopic Swallowing Studies

J Clin Med. 2021 Jun 18;10(12):2681. doi: 10.3390/jcm10122681.

Authors

Yuna Kim¹, Hyun-Il Kim², Geun Seok Park¹, Seo Young Kim¹, Sang-Il Choi^{2

3}, Seong Jae Lee^{1

4}

Affiliations

¹ Department of Rehabilitation Medicine, Dankook University Hospital, Cheonan 31116, Korea.
² Department of Computer Science and Engineering, Dankook University, Yongin 16890, Korea.
³ Department of Computer Engineering, Dankook University, Yongin 16890, Korea.
⁴ Department of Rehabilitation Medicine, College of Medicine, Dankook University, Cheonan 31116, Korea.

Abstract

Computer-assisted analysis is expected to improve the reliability of videofluoroscopic swallowing studies (VFSSs), but its usefulness is limited. Previously, we proposed a deep learning model that can detect laryngeal penetration or aspiration fully automatically in VFSS video images, but the evidence for its reliability was insufficient. This study aims to compare the intra- and inter-rater reliability of the computer model and human raters. The test dataset consisted of 173 video files from which the existence of laryngeal penetration or aspiration was judged by the computer and three physicians in two sessions separated by a one-month interval. Intra- and inter-rater reliability were calculated using Cohen's kappa coefficient, the positive reliability ratio (PRR) and the negative reliability ratio (NRR). Intrarater reliability was almost perfect for the computer and two experienced physicians. Interrater reliability was moderate to substantial between the model and each human rater and between the human raters. The average PRR and NRR between the model and the human raters were similar to those between the human raters. The results demonstrate that the deep learning model can detect laryngeal penetration or aspiration from VFSS video as reliably as human examiners.

Keywords: deep learning; deglutition; dysphagia; laryngeal penetration or aspiration; machine learning; reliability; swallowing; videofluoroscopic swallowing study.

Abstract

Grants and funding