CenterNet++ for Object Detection

IEEE Trans Pattern Anal Mach Intell. 2024 May;46(5):3509-3521. doi: 10.1109/TPAMI.2023.3342120. Epub 2024 Apr 3.

Abstract

There are two mainstream approaches for object detection: top-down and bottom-up. The state-of-the-art approaches are mainly top-down methods. In this paper, we demonstrate that bottom-up approaches show competitive performance compared with top-down approaches and have higher recall rates. Our approach, named CenterNet, detects each object as a triplet of keypoints (top-left and bottom-right corners and the center keypoint). We first group the corners according to some designed cues and confirm the object locations based on the center keypoints. The corner keypoints allow the approach to detect objects of various scales and shapes and the center keypoint reduces the confusion introduced by a large number of false-positive proposals. Our approach is an anchor-free detector because it does not need to define explicit anchor boxes. We adapt our approach to backbones with different structures, including 'hourglass'-like networks and 'pyramid'-like networks, which detect objects in single-resolution and multi-resolution feature maps, respectively. On the MS-COCO dataset, CenterNet with Res2Net-101 and Swin-Transformer achieve average precisions (APs) of 53.7% and 57.1%, respectively, outperforming all existing bottom-up detectors and achieving state-of-the-art performance. We also design a real-time CenterNet model, which achieves a good trade-off between accuracy and speed, with an AP of 43.6% at 30.5 frames per second (FPS).