Pushing the limits of remote RF sensing by reading lips under the face mask

Hira Hameed; Muhammad Usman; Ahsen Tahir; Amir Hussain; Hasan Abbas; Tie Jun Cui; Muhammad Ali Imran; Qammer H Abbasi

doi:10.1038/s41467-022-32231-1

Pushing the limits of remote RF sensing by reading lips under the face mask

Nat Commun. 2022 Sep 7;13(1):5168. doi: 10.1038/s41467-022-32231-1.

Authors

Hira Hameed¹, Muhammad Usman^{1

2}, Ahsen Tahir^{1

3}, Amir Hussain⁴, Hasan Abbas¹, Tie Jun Cui⁵, Muhammad Ali Imran¹, Qammer H Abbasi⁶

Affiliations

¹ University of Glasgow, James Watt School of Engineering, Glasgow, G12 8QQ, UK.
² School of Computing, Engineering and Built Environment, Glasgow Caledonian University, Glasgow, G4 0BA, UK.
³ Department of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan.
⁴ School of computing, Edinburgh Napier University, Scotland, UK.
⁵ State Key Laboratory of Millimetre Waves, Southeast University, Nanjing, China.
⁶ University of Glasgow, James Watt School of Engineering, Glasgow, G12 8QQ, UK. qammer.abbasi@glasgow.ac.uk.

Abstract

The problem of Lip-reading has become an important research challenge in recent years. The goal is to recognise speech from lip movements. Most of the Lip-reading technologies developed so far are camera-based, which require video recording of the target. However, these technologies have well-known limitations of occlusion and ambient lighting with serious privacy concerns. Furthermore, vision-based technologies are not useful for multi-modal hearing aids in the coronavirus (COVID-19) environment, where face masks have become a norm. This paper aims to solve the fundamental limitations of camera-based systems by proposing a radio frequency (RF) based Lip-reading framework, having an ability to read lips under face masks. The framework employs Wi-Fi and radar technologies as enablers of RF sensing based Lip-reading. A dataset comprising of vowels A, E, I, O, U and empty (static/closed lips) is collected using both technologies, with a face mask. The collected data is used to train machine learning (ML) and deep learning (DL) models. A high classification accuracy of 95% is achieved on the Wi-Fi data utilising neural network (NN) models. Moreover, similar accuracy is achieved by VGG16 deep learning model on the collected radar-based dataset.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

COVID-19* / prevention & control
Humans
Lipreading
Masks*
Neural Networks, Computer
Personal Protective Equipment