Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language

Abdinabi Mukhamadiyev; Ilyos Khujayarov; Oybek Djuraev; Jinsoo Cho

doi:10.3390/s22103683

Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language

Sensors (Basel). 2022 May 12;22(10):3683. doi: 10.3390/s22103683.

Authors

Abdinabi Mukhamadiyev¹, Ilyos Khujayarov², Oybek Djuraev³, Jinsoo Cho¹

Affiliations

¹ Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Korea.
² Department of Information Technologies, Samarkand Branch of Tashkent University of Information Technologies Named after Muhammad al-Khwarizmi, Tashkent 140100, Uzbekistan.
³ Department of Hardware and Software of Control Systems in Telecommunication, Tashkent University of Information Technologies Named after Muhammad al-Khwarizmi, Tashkent 100084, Uzbekistan.

Abstract

Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanish, Japanese, or Chinese, disregarding other low-resource languages, such as Uzbek, leaving their analysis open. In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language and its dialects. The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. We evaluated the linguistic and lay-native speaker performances on the Uzbek language dataset, which was collected as a part of this study. Experimental results show that the proposed model achieved a word error rate of 14.3% using 207 h of recordings as an Uzbek language training dataset.

Keywords: CTC-attention; Uzbek language; convolutional neural network; deep learning; end-to-end speech recognition; hidden Markov model; transformers.

MeSH terms

Deep Learning*
Humans
Language
Linguistics
Speech
Speech Perception*

Grants and funding

GRRC-Gachon2020(B02)/This work was supported by the GRRC program of Gyeonggi province. [GRRC-Gachon2020(B02), AI-based Medical Information Analysis]