Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds

Bruno L Giordano; Michele Esposito; Giancarlo Valente; Elia Formisano

doi:10.1038/s41593-023-01285-9

Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds

Nat Neurosci. 2023 Apr;26(4):664-672. doi: 10.1038/s41593-023-01285-9. Epub 2023 Mar 16.

Authors

Bruno L Giordano¹, Michele Esposito², Giancarlo Valente², Elia Formisano^{3

4

5}

Affiliations

¹ Institut de Neurosciences de La Timone, UMR 7289, CNRS and Université Aix-Marseille, Marseille, France. bruno.giordano@univ-amu.fr.
² Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands.
³ Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands. e.formisano@maastrichtuniversity.nl.
⁴ Maastricht Centre for Systems Biology (MaCSBio), Faculty of Science and Engineering, Maastricht University, Maastricht, the Netherlands. e.formisano@maastrichtuniversity.nl.
⁵ Brightlands Institute for Smart Society (BISS), Maastricht University, Maastricht, the Netherlands. e.formisano@maastrichtuniversity.nl.

Abstract

Recognizing sounds implicates the cerebral transformation of input waveforms into semantic representations. Although past research identified the superior temporal gyrus (STG) as a crucial cortical region, the computational fingerprint of these cerebral transformations remains poorly characterized. Here, we exploit a model comparison framework and contrasted the ability of acoustic, semantic (continuous and categorical) and sound-to-event deep neural network representation models to predict perceived sound dissimilarity and 7 T human auditory cortex functional magnetic resonance imaging responses. We confirm that spectrotemporal modulations predict early auditory cortex (Heschl's gyrus) responses, and that auditory dimensions (for example, loudness, periodicity) predict STG responses and perceived dissimilarity. Sound-to-event deep neural networks predict Heschl's gyrus responses similar to acoustic models but, notably, they outperform all competing models at predicting both STG responses and perceived dissimilarity. Our findings indicate that STG entails intermediate acoustic-to-semantic sound representations that neither acoustic nor semantic models can account for. These representations are compositional in nature and relevant to behavior.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Acoustic Stimulation / methods
Acoustics
Auditory Cortex* / physiology
Auditory Perception / physiology
Brain Mapping / methods
Humans
Magnetic Resonance Imaging
Semantics*