Object Pose Estimation Using Edge Images Synthesized from Shape Information

Atsunori Moteki; Hideo Saito

doi:10.3390/s22249610

Object Pose Estimation Using Edge Images Synthesized from Shape Information

Sensors (Basel). 2022 Dec 8;22(24):9610. doi: 10.3390/s22249610.

Authors

Atsunori Moteki^{1

2}, Hideo Saito¹

Affiliations

¹ Graduate School of Science and Technology, Keio University, Yokohama 223-8522, Japan.
² Fujitsu Limited, Kawasaki 211-8588, Japan.

Abstract

This paper presents a method for estimating the six Degrees of Freedom (6DoF) pose of texture-less objects from a monocular image by using edge information. The deep learning-based pose estimation method needs a large dataset containing pairs of an image and ground truth pose of objects. To alleviate the cost of collecting a dataset, we focus on the method using a dataset made by computer graphics (CG). This simulation-based method prepares a thousand images by rendering the computer-aided design (CAD) data of the object and trains a deep-learning model. As an inference stage, a monocular RGB image is entered into the model, and the object's pose is estimated. The representative simulation-based method, Pose Interpreter Networks, uses silhouette images as the input, thereby enabling common feature (contour) extraction from RGB and CG images. However, estimating rotation parameters is less accurate. To overcome this problem, we propose a method to use edge information extracted from the object's ridgelines for training the deep learning model. Since edge distribution changes largely according to the pose, the estimation of rotation parameters becomes more robust. Through an experiment with simulation data, we quantitatively proved the accuracy improvement compared to the previous method (error rate decreases at a certain condition are translation 22.9% and rotation: 43.4%). Moreover, through an experiment with physical data, we clarified the issues of this method and proposed an effective solution by fine-tuning (error rate decrease at a certain condition are translation 20.1% and rotation 57.7%).

Keywords: deep learning; edge; fine-tuning; monocular RGB image; pose estimation; ridgeline.

MeSH terms

Computer Simulation*

Grants and funding

This research received no external funding.