Classification of indexical and segmental features of human speech using low- and high-frequency energya)

Jeremy J Donai; D Dwayne Paschall; Saad Haider

doi:10.1121/10.0022414

Classification of indexical and segmental features of human speech using low- and high-frequency energya)

J Acoust Soc Am. 2023 Nov 1;154(5):3201-3209. doi: 10.1121/10.0022414.

Authors

Jeremy J Donai¹, D Dwayne Paschall², Saad Haider³

Affiliations

¹ Department of Speech, Language, and Hearing Sciences, Texas Tech University Health Sciences Center, Lubbock, Texas 79430, USA.
² Predictive Market Analytics, Frisco, Texas 75035, USA.
³ Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, Texas 79409, USA.

PMID: 37971213
DOI: 10.1121/10.0022414

Abstract

The high-frequency region (above 4-5 kHz) of the speech spectrum has received substantial research attention over the previous decade, with a host of studies documenting the presence of important and useful information in this region. The purpose of the current experiment was to compare the presence of indexical and segmental information in the low- and high-frequency region of speech (below and above 4 kHz) and to determine the extent to which information from these regions can be used in a machine learning framework to correctly classify indexical and segmental aspects of the speech signal. Naturally produced vowel segments produced by ten male and ten female talkers were used as input to a temporal dictionary ensemble classification model in unfiltered, low-pass filtered (below 4 kHz), and high-pass filtered (above 4 kHz) conditions. Classification performance in the unfiltered and low-pass filtered conditions was approximately 90% or better for vowel categorization, talker sex, and individual talker identity tasks. Classification performance for high-pass filtered signals composed of energy above 4 kHz was well above chance for the same tasks. For several classification tasks (i.e., talker sex and talker identity), high-pass filtering had minimal effect on classification performance, suggesting the preservation of indexical information above 4 kHz.

MeSH terms

Attention
Female
Humans
Male
Speech Perception*
Speech*