Speaker-turn aware diarization for speech-based cognitive assessments

Front Neurosci. 2024 Jan 16:17:1351848. doi: 10.3389/fnins.2023.1351848. eCollection 2023.

Abstract

Introduction: Speaker diarization is an essential preprocessing step for diagnosing cognitive impairments from speech-based Montreal cognitive assessments (MoCA).

Methods: This paper proposes three enhancements to the conventional speaker diarization methods for such assessments. The enhancements tackle the challenges of diarizing MoCA recordings on two fronts. First, multi-scale channel interdependence speaker embedding is used as the front-end speaker representation for overcoming the acoustic mismatch caused by far-field microphones. Specifically, a squeeze-and-excitation (SE) unit and channel-dependent attention are added to Res2Net blocks for multi-scale feature aggregation. Second, a sequence comparison approach with a holistic view of the whole conversation is applied to measure the similarity of short speech segments in the conversation, which results in a speaker-turn aware scoring matrix for the subsequent clustering step. Third, to further enhance the diarization performance, we propose incorporating a pairwise similarity measure so that the speaker-turn aware scoring matrix contains both local and global information across the segments.

Results: Evaluations on an interactive MoCA dataset show that the proposed enhancements lead to a diarization system that outperforms the conventional x-vector/PLDA systems under language-, age-, and microphone-mismatch scenarios.

Discussion: The results also show that the proposed enhancements can help hypothesize the speaker-turn timestamps, making the diarization method amendable to datasets without timestamp information.

Keywords: MOCA; comprehensive scoring; dementia detection; speaker diarization; speaker embedding; speaker-turn timestamps.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was in part supported by Research Grands Council of Hong Kong, Theme-based Research Scheme (Ref.: T45-407/19-N); National Natural Science Foundation of China (61971289); Medical-Engineering Interdisciplinary Research Foundation of Shenzhen University (2023YG020); Shenzhen Hong Kong Institute of Brain Science-Shenzhen Fundamental Research Institutions (No. 2023SHIBS0003).