Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data

Ayesha Pervaiz; Fawad Hussain; Huma Israr; Muhammad Ali Tahir; Fawad Riasat Raja; Naveed Khan Baloch; Farruh Ishmanov; Yousaf Bin Zikria

doi:10.3390/s20082326

Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data

Sensors (Basel). 2020 Apr 19;20(8):2326. doi: 10.3390/s20082326.

Authors

Ayesha Pervaiz¹, Fawad Hussain¹, Huma Israr², Muhammad Ali Tahir², Fawad Riasat Raja³, Naveed Khan Baloch¹, Farruh Ishmanov⁴, Yousaf Bin Zikria⁵

Affiliations

¹ Department of Computer Engineering, University of Engineering and Technology, Taxila 47050, Pakistan.
² School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad H-12, Pakistan.
³ Machine Intelligence and Pattern Analysis Laboratory, Griffith University, Nathan, QLD 4111, Australia.
⁴ Department of Electronics and Communication Engineering, Kwangwoon University, Seoul 447-1, Korea.
⁵ Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Korea.

Abstract

The advent of new devices, technology, machine learning techniques, and the availability of free large speech corpora results in rapid and accurate speech recognition. In the last two decades, extensive research has been initiated by researchers and different organizations to experiment with new techniques and their applications in speech processing systems. There are several speech command based applications in the area of robotics, IoT, ubiquitous computing, and different human-computer interfaces. Various researchers have worked on enhancing the efficiency of speech command based systems and used the speech command dataset. However, none of them catered to noise in the same. Noise is one of the major challenges in any speech recognition system, as real-time noise is a very versatile and unavoidable factor that affects the performance of speech recognition systems, particularly those that have not learned the noise efficiently. We thoroughly analyse the latest trends in speech recognition and evaluate the speech command dataset on different machine learning based and deep learning based techniques. A novel technique is proposed for noise robustness by augmenting noise in training data. Our proposed technique is tested on clean and noisy data along with locally generated data and achieves much better results than existing state-of-the-art techniques, thus setting a new benchmark.

Keywords: acoustic modelling; automatic speech recognition; data science; deep learning; deep neural networks; kaldi; language modelling; speech command set; voice recognition; word error rate.

MeSH terms

Deep Learning
Humans
Machine Learning
Neural Networks, Computer
Noise*
Speech Perception / physiology
Speech Recognition Software*