Audio-Visual Automatic Speech Recognition Towards Education for Disabilities

Saswati Debnath; Pinki Roy; Suyel Namasudra; Ruben Gonzalez Crespo

doi:10.1007/s10803-022-05654-4

Audio-Visual Automatic Speech Recognition Towards Education for Disabilities

J Autism Dev Disord. 2023 Sep;53(9):3581-3594. doi: 10.1007/s10803-022-05654-4. Epub 2022 Jul 12.

Authors

Saswati Debnath¹, Pinki Roy², Suyel Namasudra^{3

4}, Ruben Gonzalez Crespo⁵

Affiliations

¹ Department of Computer Science and Engineering, Alliance University, Bangalore, Karnataka, India.
² Department of Computer Science and Engineering, National Institute of Technology, Silchar, Assam, India.
³ Department of Computer Science and Engineering, National Institute of Technology Patna, Patna, Bihar, India. suyelnamasudra@gmail.com.
⁴ Universidad Internacional de La Rioja, Logroño, Spain. suyelnamasudra@gmail.com.
⁵ Universidad Internacional de La Rioja, Logroño, Spain.

PMID: 35819585
DOI: 10.1007/s10803-022-05654-4

Abstract

Education is a fundamental right that enriches everyone's life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition.

Keywords: AV-ASR; Clustering algorithm; GLCM; LBP-TOP; MFCC; Supervised learning.

Publication types

Retracted Publication

MeSH terms

Autism Spectrum Disorder*
Disabled Persons*
Humans
Speech
Speech Perception*