Research on efficient feature extraction: Improving YOLOv5 backbone for facial expression detection in live streaming scenes

Zongwei Li; Jia Song; Kai Qiao; Chenghai Li; Yanhui Zhang; Zhenyu Li

doi:10.3389/fncom.2022.980063

Research on efficient feature extraction: Improving YOLOv5 backbone for facial expression detection in live streaming scenes

Front Comput Neurosci. 2022 Aug 10:16:980063. doi: 10.3389/fncom.2022.980063. eCollection 2022.

Authors

Zongwei Li¹, Jia Song¹, Kai Qiao¹, Chenghai Li², Yanhui Zhang³, Zhenyu Li¹

Affiliations

¹ School of Economics and Management, Shanghai Institute of Technology, Shanghai, China.
² School of Management Science and Engineering, Anhui University of Technology, Maanshan, China.
³ Business School, East China University of Science and Technology, Shanghai, China.

Abstract

Facial expressions, whether simple or complex, convey pheromones that can affect others. Plentiful sensory input delivered by marketing anchors' facial expressions to audiences can stimulate consumers' identification and influence decision-making, especially in live streaming media marketing. This paper proposes an efficient feature extraction network based on the YOLOv5 model for detecting anchors' facial expressions. First, a two-step cascade classifier and recycler is established to filter invalid video frames to generate a facial expression dataset of anchors. Second, GhostNet and coordinate attention are fused in YOLOv5 to eliminate latency and improve accuracy. YOLOv5 modified with the proposed efficient feature extraction structure outperforms the original YOLOv5 on our self-built dataset in both speed and accuracy.

Keywords: attention mechanism; cascade classifier; live streaming; model optimization; object detection.