The reliability of simultaneous versus individual data collection during stuttering assessment

Jason H Davidow; Jun Ye; Robin L Edge

doi:10.1111/1460-6984.12860

The reliability of simultaneous versus individual data collection during stuttering assessment

Int J Lang Commun Disord. 2023 Jul-Aug;58(4):1251-1267. doi: 10.1111/1460-6984.12860. Epub 2023 Mar 2.

Authors

Jason H Davidow¹, Jun Ye², Robin L Edge³

Affiliations

¹ Department of Speech-Language-Hearing Sciences, Hofstra University, Hempstead, NY, USA.
² Department of Statistics, University of Akron, Akron, OH, USA.
³ Department of Communication Sciences & Disorders, Jacksonville University, Jacksonville, FL, USA.

PMID: 36861494
DOI: 10.1111/1460-6984.12860

Abstract

Background: Speech-language pathologists often multitask in order to be efficient with their commonly large caseloads. In stuttering assessment, multitasking often involves collecting multiple measures simultaneously.

Aims: The present study sought to determine reliability when collecting multiple measures simultaneously versus individually.

Methods & procedures: Over two time periods, 50 graduate students viewed videos of four persons who stutter (PWS) and counted the number of stuttered syllables and total number of syllables uttered, and rated speech naturalness. Students were randomly assigned to one of two groups: the simultaneous group, in which all measures were gathered during one viewing; and the individual group, in which one measure was gathered per viewing. Relative and absolute intra- and inter-rater reliability values were calculated for each measure.

Outcomes & results: The following results were notable: better intra-rater relative reliability for the number of stuttered syllables for the individual group (intraclass correlation coefficient (ICC) = 0.839) compared with the simultaneous group (ICC = 0.350), smaller intra-rater standard error of measurement (SEM) (i.e., better absolute reliability) for the number of stuttered syllables for the individual group (7.40) versus the simultaneous group (15.67), and better inter-rater absolute reliability for the total number of syllables for the individual group (88.29) compared with the simultaneous group (125.05). Absolute reliability was unacceptable for all measures across both groups.

Conclusions & implications: These findings show that judges are likely to be more reliable when identifying stuttered syllables in isolation than when simultaneously collecting them with total syllables spoken and naturalness data. Results are discussed in terms of narrowing the reliability gap between data collection methods for stuttered syllables, improving overall reliability of stuttering measurements, and a procedural change when implementing widely used stuttering assessment protocols.

What this paper adds: What is already known on the subject The reliability of stuttering judgments has been found to be unacceptable across a number of studies, including those examining the reliability of the most popular stuttering assessment tool, the Stuttering Severity Instrument (4th edition). The SSI-4, and other assessment applications, involve collecting multiple measures simultaneously. It has been suggested, but not examined, that collecting measures simultaneously, which occurs in the most popular stuttering assessment protocols, may result in substantially inferior reliability when compared to collecting measures individually. What this paper adds to existing knowledge The present study has multiple novel findings. First, relative and absolute intra-rater reliability were substantially better when stuttered syllables data were collected individually compared to when the same data were collected simultaneously with total number of syllables and speech naturalness data. Second, inter-rater absolute reliability for total number of syllables was also substantially better when collected individually. Third, intra-rater and inter-rater reliability were similar when speech naturalness ratings were given individually compared to when they were given while simultaneously counting stuttered and fluent syllables. What are the potential or actual clinical implications of this work? Clinicians can be more reliable when identifying stuttered syllables individually compared to when they judge stuttering along with other clinical measures of stuttering. In addition, when clinicians and researchers use current popular protocols for assessing stuttering that recommend simultaneous data collection, including the SSI-4, they should instead consider collecting stuttering event counts individually. This procedural change will lead to more reliable data and stronger clinical decision making.

Keywords: assessment; reliability; stuttering.

Publication types

Randomized Controlled Trial

MeSH terms

Humans
Reproducibility of Results
Severity of Illness Index
Speech
Speech Production Measurement / methods
Stuttering* / diagnosis