Visual Attention and Color Cues for 6D Pose Estimation on Occluded Scenarios Using RGB-D Data

Joel Vidal; Chyi-Yeu Lin; Robert Martí

doi:10.3390/s21238090

Visual Attention and Color Cues for 6D Pose Estimation on Occluded Scenarios Using RGB-D Data

Sensors (Basel). 2021 Dec 3;21(23):8090. doi: 10.3390/s21238090.

Authors

Joel Vidal^{1

2}, Chyi-Yeu Lin^{2

3

4}, Robert Martí¹

Affiliations

¹ Computer Vision and Robotics Institute, University of Girona, 17003 Girona, Spain.
² Department of Mechanical Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan.
³ Taiwan Building Technology Center, National Taiwan University of Science and Technology, Taipei 106, Taiwan.
⁴ Center for Cyber-Physical System Innovation, National Taiwan University of Science and Technology, Taipei 106, Taiwan.

Abstract

Recently, 6D pose estimation methods have shown robust performance on highly cluttered scenes and different illumination conditions. However, occlusions are still challenging, with recognition rates decreasing to less than 10% for half-visible objects in some datasets. In this paper, we propose to use top-down visual attention and color cues to boost performance of a state-of-the-art method on occluded scenarios. More specifically, color information is employed to detect potential points in the scene, improve feature-matching, and compute more precise fitting scores. The proposed method is evaluated on the Linemod occluded (LM-O), TUD light (TUD-L), Tejani (IC-MI) and Doumanoglou (IC-BIN) datasets, as part of the SiSo BOP benchmark, which includes challenging highly occluded cases, illumination changing scenarios, and multiple instances. The method is analyzed and discussed for different parameters, color spaces and metrics. The presented results show the validity of the proposed approach and their robustness against illumination changes and multiple instance scenarios, specially boosting the performance on relatively high occluded cases. The proposed solution provides an absolute improvement of up to 30% for levels of occlusion between 40% to 50%, outperforming other approaches with a best overall recall of 71% for the LM-O, 92% for TUD-L, 99.3% for IC-MI and 97.5% for IC-BIN.

Keywords: 3D object recognition; 6D pose estimation; RGB-D data; computer vision; model-based vision; scene understanding.

MeSH terms

Cues*
Lighting*
Recognition, Psychology