HGR-Net: Hierarchical Graph Reasoning Network for Arbitrary Shape Scene Text Detection

IEEE Trans Image Process. 2023:32:4142-4155. doi: 10.1109/TIP.2023.3294822. Epub 2023 Jul 20.

Abstract

As a prerequisite step of scene text reading, scene text detection is known as a challenging task due to natural scene text diversity and variability. Most existing methods either adopt bottom-up sub-text component extraction or focus on top-down text contour regression. From a hybrid perspective, we explore hierarchical text instance-level and component-level representation for arbitrarily-shaped scene text detection. In this work, we propose a novel Hierarchical Graph Reasoning Network (HGR-Net), which consists of a Text Feature Extraction Network (TFEN) and a Text Relation Learner Network (TRLN). TFEN adaptively learns multi-grained text candidates based on shared convolutional feature maps, including instance-level text contours and component-level quadrangles. In TRLN, an inter-text graph is constructed to explore global contextual information with position-awareness between text instances, and an intra-text graph is designed to estimate geometric attributes for establishing component-level linkages. Next, we bridge the cross-feed interaction between instance-level and component-level, and it further achieves hierarchical relational reasoning by learning complementary graph embeddings across levels. Experiments conducted on three publicly available benchmarks SCUT-CTW1500, Total-Text, and ICDAR15 have demonstrated that HGR-Net achieves state-of-the-art performance on arbitrary orientation and arbitrary shape scene text detection.