Development of Hausa dataset a baseline for speech recognition

Umar Adam Ibrahim; Moussa Mahamat Boukar; Muhammed Aliyu Suleiman

doi:10.1016/j.dib.2022.107820

Development of Hausa dataset a baseline for speech recognition

Data Brief. 2022 Jan 10:40:107820. doi: 10.1016/j.dib.2022.107820. eCollection 2022 Feb.

Authors

Umar Adam Ibrahim¹, Moussa Mahamat Boukar¹, Muhammed Aliyu Suleiman¹

Affiliation

¹ Faculty of Natural and Applied Sciences, Computer Science Department, Nile University of Nigeria, Abuja, Nigeria.

Abstract

The Hausa language read-speech dataset was created by recording native Hausa speakers. The recording took place at Nile university of Nigeria audio studio and radio broadcasting studio. The recorded dataset was segmented into unigram and bigram. The Hausa speech dataset contain 47hr of recorded audio speech. The dataset can be used for automatic speech recognition, speech synthesis, Text-to-Speech and speech-to-text application.

Keywords: Automatic speech; Corpus; Hausa corpus; NLP; Text-to-speech.