Comparing accuracy in voice-based assessments of biological speaker traits across speech types

Sci Rep. 2023 Dec 27;13(1):22989. doi: 10.1038/s41598-023-49596-y.

Abstract

Nonverbal acoustic parameters of the human voice provide cues to a vocaliser's sex, age, and body size that are relevant in human social and sexual communication, and also increasingly so for computer-based voice recognition and synthesis technologies. While studies have shown some capacity in human listeners to gauge these biological traits from unseen speakers, it remains unknown whether speech complexity improves accuracy. Here, in over 200 vocalisers and 1500 listeners of both sexes, we test whether voice-based assessments of sex, age, height and weight vary from isolated vowels and words, to sequences of vowels and words, to full sentences or paragraphs. We show that while listeners judge sex and especially age more accurately as speech complexity increases, accuracy remains high across speech types, even for a single vowel sound. In contrast, the actual heights and weights of vocalisers explain comparatively less variance in listener's assessments of body size, which do not vary systematically by speech type. Our results thus show that while more complex speech can improve listeners' biological assessments, the gain is ecologically small, as listeners already show an impressive capacity to gauge speaker traits from extremely short bouts of standardised speech, likely owing to within-speaker stability in underlying nonverbal vocal parameters such as voice pitch. We discuss the methodological, technological, and social implications of these results.

MeSH terms

  • Body Size
  • Communication
  • Female
  • Humans
  • Male
  • Speech
  • Speech Acoustics
  • Speech Perception*
  • Voice*