Dataset for the recognition of Kurdish sound dialects

Data Brief. 2024 Feb 22:53:110231. doi: 10.1016/j.dib.2024.110231. eCollection 2024 Apr.

Abstract

Dialect recognition System (DRS) is a highly significant subject within the field of speech analysis. The performance of speech recognition systems is adversely impacted by factors such as the age, gender, and dialect features of the speaker. In order to address variations in dialect, it is possible to incorporate DRS into speech recognition systems. The system can be configured to utilize the appropriate speech recognition model based on the identification of the spoken dialect. Currently, there is a lack of available datasets suitable for the development of automatic dialect recognition systems specifically tailored for the Kurdish language. The proposed dataset under consideration is assessed using experimental data that has been gathered by personnel associated with the Computer Science Department at the University of Halabja. As the Kurdish language has three main dialects: Northern Kurdish (Badini variation), Central Kurdish (Sorani variant), and Hawrami, three dialects are included in the dataset.

Keywords: Badini; Dialect recognition; Hawrami; Kurdish dialect; Sorani.