Computational Models of Auditory Scene Analysis: A Review

Beáta T Szabó; Susan L Denham; István Winkler

doi:10.3389/fnins.2016.00524

Computational Models of Auditory Scene Analysis: A Review

Front Neurosci. 2016 Nov 15:10:524. doi: 10.3389/fnins.2016.00524. eCollection 2016.

Authors

Beáta T Szabó¹, Susan L Denham², István Winkler³

Affiliations

¹ Faculty of Information Technology and Bionics, Pázmány Péter Catholic UniversityBudapest, Hungary; Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of SciencesBudapest, Hungary.
² School of Psychology, University of Plymouth Plymouth, UK.
³ Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences Budapest, Hungary.

Abstract

Auditory scene analysis (ASA) refers to the process (es) of parsing the complex acoustic input into auditory perceptual objects representing either physical sources or temporal sound patterns, such as melodies, which contributed to the sound waves reaching the ears. A number of new computational models accounting for some of the perceptual phenomena of ASA have been published recently. Here we provide a theoretically motivated review of these computational models, aiming to relate their guiding principles to the central issues of the theoretical framework of ASA. Specifically, we ask how they achieve the grouping and separation of sound elements and whether they implement some form of competition between alternative interpretations of the sound input. We consider the extent to which they include predictive processes, as important current theories suggest that perception is inherently predictive, and also how they have been evaluated. We conclude that current computational models of ASA are fragmentary in the sense that rather than providing general competing interpretations of ASA, they focus on assessing the utility of specific processes (or algorithms) for finding the causes of the complex acoustic signal. This leaves open the possibility for integrating complementary aspects of the models into a more comprehensive theory of ASA.

Keywords: auditory object representation; auditory scene analysis; auditory streaming; bi-/multi-stable perception; computational model; predictive processing.

Publication types

Review