Reliability of Machine and Human Examiners for Detection of Laryngeal Penetration or Aspiration in Videofluoroscopic Swallowing Studies

J Clin Med. 2021 Jun 18;10(12):2681. doi: 10.3390/jcm10122681.

Abstract

Computer-assisted analysis is expected to improve the reliability of videofluoroscopic swallowing studies (VFSSs), but its usefulness is limited. Previously, we proposed a deep learning model that can detect laryngeal penetration or aspiration fully automatically in VFSS video images, but the evidence for its reliability was insufficient. This study aims to compare the intra- and inter-rater reliability of the computer model and human raters. The test dataset consisted of 173 video files from which the existence of laryngeal penetration or aspiration was judged by the computer and three physicians in two sessions separated by a one-month interval. Intra- and inter-rater reliability were calculated using Cohen's kappa coefficient, the positive reliability ratio (PRR) and the negative reliability ratio (NRR). Intrarater reliability was almost perfect for the computer and two experienced physicians. Interrater reliability was moderate to substantial between the model and each human rater and between the human raters. The average PRR and NRR between the model and the human raters were similar to those between the human raters. The results demonstrate that the deep learning model can detect laryngeal penetration or aspiration from VFSS video as reliably as human examiners.

Keywords: deep learning; deglutition; dysphagia; laryngeal penetration or aspiration; machine learning; reliability; swallowing; videofluoroscopic swallowing study.