The Reliability of Expert Diagnosis of Childhood Apraxia of Speech

J Speech Lang Hear Res. 2023 Aug 29:1-18. doi: 10.1044/2023_JSLHR-22-00677. Online ahead of print.

Abstract

Purpose: The current standard for clinical diagnosis of childhood apraxia of speech (CAS) is expert clinician judgment. The psychometric properties of this standard are not well understood; however, they are important for improving clinical diagnosis. The purpose of this study is to determine the extent to which experts agree on the clinical diagnosis of CAS using two cohorts of children with mixed speech sound disorders (SSDs).

Method: Speech samples of children with SSDs were obtained from previous and ongoing research from video recordings of children aged 3-8 years (n = 36) and audio recordings of children aged 8-17 years (n = 56). A total of 23 expert, English-speaking clinicians were recruited internationally. Three of these experts rated each speech sample to provide a description of the observed features and a diagnosis. Intrarater reliability was acceptable at 85% agreement.

Results: Interrater reliability on the presence or absence of CAS among experts was poor both as a categorical diagnosis (κ = .187, 95% confidence interval [CI] [0.089, 0.286]) and on a continuous "likelihood of CAS" scale (0-100; intraclass correlation = .183, 95% CI [.037, .347]). Reliability was similar across the video-recorded and audio-only samples. There was greater agreement on other diagnoses (such as articulation disorder) than on the diagnosis of CAS, although these too did not meet the predetermined standard. Likelihood of CAS was greater in children who presented with more American Speech-Language-Hearing Association CAS consensus features.

Conclusions: Different expert raters had different thresholds for applying the diagnosis of CAS. If expert clinician judgment is to be used for diagnosis of CAS or other SSDs, further standardization and calibration is needed to increase interrater reliability. Diagnosis may require operationalized checklists or reliable measures that operate along a diagnostic continuum.

Supplemental material: https://doi.org/10.23641/asha.23949105.