Audio-Visual Automatic Speech Recognition Towards Education for Disabilities

J Autism Dev Disord. 2023 Sep;53(9):3581-3594. doi: 10.1007/s10803-022-05654-4. Epub 2022 Jul 12.

Abstract

Education is a fundamental right that enriches everyone's life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition.

Keywords: AV-ASR; Clustering algorithm; GLCM; LBP-TOP; MFCC; Supervised learning.

Publication types

  • Retracted Publication

MeSH terms

  • Autism Spectrum Disorder*
  • Disabled Persons*
  • Humans
  • Speech
  • Speech Perception*