Research on efficient feature extraction: Improving YOLOv5 backbone for facial expression detection in live streaming scenes

Front Comput Neurosci. 2022 Aug 10:16:980063. doi: 10.3389/fncom.2022.980063. eCollection 2022.

Abstract

Facial expressions, whether simple or complex, convey pheromones that can affect others. Plentiful sensory input delivered by marketing anchors' facial expressions to audiences can stimulate consumers' identification and influence decision-making, especially in live streaming media marketing. This paper proposes an efficient feature extraction network based on the YOLOv5 model for detecting anchors' facial expressions. First, a two-step cascade classifier and recycler is established to filter invalid video frames to generate a facial expression dataset of anchors. Second, GhostNet and coordinate attention are fused in YOLOv5 to eliminate latency and improve accuracy. YOLOv5 modified with the proposed efficient feature extraction structure outperforms the original YOLOv5 on our self-built dataset in both speed and accuracy.

Keywords: attention mechanism; cascade classifier; live streaming; model optimization; object detection.