Object-Oriented and Visual-Based Localization in Urban Environments

Sensors (Basel). 2024 Mar 21;24(6):2014. doi: 10.3390/s24062014.

Abstract

In visual-based localization, prior research falls short in addressing challenges for the Internet of Things with limited computational resources. The dominant state-of-the-art models are based on separate feature extractors and descriptors without consideration of the constraints of small hardware, the issue of inconsistent image scale, or the presence of multi-objects. We introduce "OOPose", a real-time object-oriented pose estimation framework that leverages dense features from off-the-shelf object detection neural networks. It balances between pixel-matching accuracy and processing speed, enhancing overall performance. When input images share a comparable set of features, their matching accuracy is substantially heightened, while the reduction in image size facilitates faster processing but may compromise accuracy. OOPose resizes both the original library and cropped query object images to a width of 416 pixels. This adjustment results in a 2.4-fold improvement in pose accuracy and an 8.6-fold increase in processing speed. Moreover, OOPose eliminates the need for traditional sparse point extraction and description processes by capitalizing on dense network backbone features and selecting the detected query objects and sources of object library images, ensuring not only 1.3 times more accurate results but also three times greater stability compared to real-time sparse ORB matching algorithms. Beyond enhancements, we demonstrated the feasibility of OOPose in an autonomous mobile robot, enabling self-localization with a single camera at 10 FPS on a single CPU. It proves the cost-effectiveness and real-world applicability of OOPose for small embedded devices, setting the stage for potential markets and providing end-users with distinct advantages.

Keywords: Internet of Things; object detection; pose estimation; visual-based localization.