Misophonia Sound Recognition Using Vision Transformer

Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul:2023:1-4. doi: 10.1109/EMBC40787.2023.10340283.

Abstract

Misophonia is a condition characterized by an abnormal emotional response to specific sounds, such as eating, breathing, and clock ticking noises. Sound classification for misophonia is an important area of research since it can benefit in the development of interventions and therapies for individuals affected by the condition. In the area of sound classification, deep learning algorithms such as Convolutional Neural Networks (CNNs) have achieved a high accuracy performance and proved their ability in feature extraction and modeling. Recently, transformer models have surpassed CNNs as the dominant technology in the field of audio classification. In this paper, a transformer-based deep learning algorithm is proposed to automatically identify trigger sounds and the characterization of these sounds using acoustic features. The experimental results demonstrate that the proposed algorithm can classify trigger sounds with high accuracy and specificity. These findings provide a foundation for future research on the development of interventions and therapies for misophonia.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Emotions*
  • Hearing Disorders / psychology
  • Humans
  • Noise
  • Sound*

Supplementary concepts

  • misophonia