Accent Recognition with Hybrid Phonetic Features

Zhan Zhang; Yuehai Wang; Jianyi Yang

doi:10.3390/s21186258

Accent Recognition with Hybrid Phonetic Features

Sensors (Basel). 2021 Sep 18;21(18):6258. doi: 10.3390/s21186258.

Authors

Zhan Zhang¹, Yuehai Wang¹, Jianyi Yang¹

Affiliation

¹ Department of Information and Electronic Engineering, Zhejiang University, Hangzhou 310007, China.

Abstract

The performance of voice-controlled systems is usually influenced by accented speech. To make these systems more robust, frontend accent recognition (AR) technologies have received increased attention in recent years. As accent is a high-level abstract feature that has a profound relationship with language knowledge, AR is more challenging than other language-agnostic audio classification tasks. In this paper, we use an auxiliary automatic speech recognition (ASR) task to extract language-related phonetic features. Furthermore, we propose a hybrid structure that incorporates the embeddings of both a fixed acoustic model and a trainable acoustic model, making the language-related acoustic feature more robust. We conduct several experiments on the AESRC dataset. The results demonstrate that our approach can obtain an 8.02% relative improvement compared with the Transformer baseline, showing the merits of the proposed method.

Keywords: accent recognition; accented English speech recognition; audio classification.

MeSH terms

Language
Phonetics*
Recognition, Psychology
Speech
Speech Perception*