A Recurrent Deep Network for Estimating the Pose of Real Indoor Images from Synthetic Image Sequences

Debaditya Acharya; Sesa Singha Roy; Kourosh Khoshelham; Stephan Winter

doi:10.3390/s20195492

A Recurrent Deep Network for Estimating the Pose of Real Indoor Images from Synthetic Image Sequences

Sensors (Basel). 2020 Sep 25;20(19):5492. doi: 10.3390/s20195492.

Authors

Debaditya Acharya^{1

2}, Sesa Singha Roy³, Kourosh Khoshelham¹, Stephan Winter¹

Affiliations

¹ Department of Infrastructure Engineering, The University of Melbourne, Parkville, Victoria 3010, Australia.
² Department of Manufacturing, Materials and Mechatronics, RMIT University, Carlton, Victoria 3053, Australia.
³ Institute for Sustainable Industries and Livable Cities, Victoria University, Werribee, Victoria 3030, Australia.

Abstract

Recently, deep convolutional neural networks (CNN) have become popular for indoor visual localisation, where the networks learn to regress the camera pose from images directly. However, these approaches perform a 3D image-based reconstruction of the indoor spaces beforehand to determine camera poses, which is a challenge for large indoor spaces. Synthetic images derived from 3D indoor models have been used to eliminate the requirement of 3D reconstruction. A limitation of the approach is the low accuracy that occurs as a result of estimating the pose of each image frame independently. In this article, a visual localisation approach is proposed that exploits the spatio-temporal information from synthetic image sequences to improve localisation accuracy. A deep Bayesian recurrent CNN is fine-tuned using synthetic image sequences obtained from a building information model (BIM) to regress the pose of real image sequences. The results of the experiments indicate that the proposed approach estimates a smoother trajectory with smaller inter-frame error as compared to existing methods. The achievable accuracy with the proposed approach is 1.6 m, which is an improvement of approximately thirty per cent compared to the existing approaches. A Keras implementation can be found in our Github repository.

Keywords: 3D building models; camera pose regression; indoor localisation; long short term memory.