DWT features performance analysis for automatic speech recognition of Urdu

Hazrat Ali; Nasir Ahmad; Xianwei Zhou; Khalid Iqbal; Sahibzada Muhammad Ali

doi:10.1186/2193-1801-3-204

DWT features performance analysis for automatic speech recognition of Urdu

Springerplus. 2014 Apr 27:3:204. doi: 10.1186/2193-1801-3-204. eCollection 2014.

Authors

Hazrat Ali¹, Nasir Ahmad², Xianwei Zhou³, Khalid Iqbal³, Sahibzada Muhammad Ali⁴

Affiliations

¹ Machine Learning Group, Department of Computing, City University London, Northampton Square, EC1V 0HB London, UK ; School of Computer and Communication Engineering, University of Science and Technology Beijing, 100083 Beijing, China.
² Department of Computer Systems Engineering, University of Engineering and Technology Peshawar, 25120 Peshawar, Pakistan.
³ School of Computer and Communication Engineering, University of Science and Technology Beijing, 100083 Beijing, China.
⁴ Department of Electrical and Computer Engineering, North Dakota State University, Fargo, ND 58108-6050 USA.

Abstract

This paper presents the work on Automatic Speech Recognition of Urdu language, using a comparative analysis for Discrete Wavelets Transform (DWT) based features and Mel Frequency Cepstral Coefficients (MFCC). These features have been extracted for one hundred isolated words of Urdu, each word uttered by ten different speakers. The words have been selected from the most frequently used words of Urdu. A variety of age and dialect has been covered by using a balanced corpus approach. After extraction of features, the classification has been achieved by using Linear Discriminant Analysis. After the classification task, the confusion matrix obtained for the DWT features has been compared with the one obtained for Mel-Frequency Cepstral Coefficients based speech recognition. The framework has been trained and tested for speech data recorded under controlled environments. The experimental results are useful in determination of the optimum features for speech recognition task.

Keywords: Automatic speech recognition; Discrete wavelet transforms; Linear discriminant analysis; Mel-frequency cepstral coefficients; Urdu isolated words recognition.