Zero-shot personalization of speech foundation models for depressed mood monitoring

Maurice Gerczuk; Andreas Triantafyllopoulos; Shahin Amiriparian; Alexander Kathan; Jonathan Bauer; Matthias Berking; Björn W Schuller

doi:10.1016/j.patter.2023.100873

Zero-shot personalization of speech foundation models for depressed mood monitoring

Patterns (N Y). 2023 Nov 1;4(11):100873. doi: 10.1016/j.patter.2023.100873. eCollection 2023 Nov 10.

Authors

Maurice Gerczuk¹, Andreas Triantafyllopoulos¹, Shahin Amiriparian¹, Alexander Kathan¹, Jonathan Bauer², Matthias Berking², Björn W Schuller^{1

3}

Affiliations

¹ Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany.
² Department of Clinical Psychology and Psychotherapy, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Erlangen, Germany.
³ GLAM, Imperial College, London, UK.

Abstract

The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient's affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual's diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness.

Keywords: deep learning; depression monitoring; foundation models; hypernetworks; personalization; speech processing.