Multi-Objective Location and Mapping Based on Deep Learning and Visual Slam

Ying Sun; Jun Hu; Juntong Yun; Ying Liu; Dongxu Bai; Xin Liu; Guojun Zhao; Guozhang Jiang; Jianyi Kong; Baojia Chen

doi:10.3390/s22197576

Multi-Objective Location and Mapping Based on Deep Learning and Visual Slam

Sensors (Basel). 2022 Oct 6;22(19):7576. doi: 10.3390/s22197576.

Authors

Ying Sun^{1

2

3}, Jun Hu^{1

2}, Juntong Yun^{1

2}, Ying Liu^{1

2}, Dongxu Bai^{1

2}, Xin Liu^{1

2}, Guojun Zhao^{1

2}, Guozhang Jiang^{1

2

3}, Jianyi Kong^{1

2

3}, Baojia Chen⁴

Affiliations

¹ Key Laboratory of Metallurgical Equipment and Control Technology, Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China.
² Research Center for Biomimetic Robot and Intelligent Measurement and Control, Wuhan University of Science and Technology, Wuhan 430081, China.
³ Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering, Wuhan University of Science and Technology, Wuhan 430081, China.
⁴ Hubei Key Laboratory of Hydroelectric Machinery Design and Maintenance, China Three Gorges University, Yichang 443002, China.

Abstract

Simultaneous localization and mapping (SLAM) technology can be used to locate and build maps in unknown environments, but the constructed maps often suffer from poor readability and interactivity, and the primary and secondary information in the map cannot be accurately grasped. For intelligent robots to interact in meaningful ways with their environment, they must understand both the geometric and semantic properties of the scene surrounding them. Our proposed method can not only reduce the absolute positional errors (APE) and improve the positioning performance of the system but also construct the object-oriented dense semantic point cloud map and output point cloud model of each object to reconstruct each object in the indoor scene. In fact, eight categories of objects are used for detection and semantic mapping using coco weights in our experiments, and most objects in the actual scene can be reconstructed in theory. Experiments show that the number of points in the point cloud is significantly reduced. The average positioning error of the eight categories of objects in Technical University of Munich (TUM) datasets is very small. The absolute positional error of the camera is also reduced with the introduction of semantic constraints, and the positioning performance of the system is improved. At the same time, our algorithm can segment the point cloud model of objects in the environment with high accuracy.

Keywords: deep learning; multi-objective location; semantic mapping; target tracking; visual SLAM.

MeSH terms

Algorithms
Deep Learning*
Imaging, Three-Dimensional* / methods

Grants and funding

This research received no external funding.