Integrating Sparse Learning-Based Feature Detectors into Simultaneous Localization and Mapping-A Benchmark Study

Sensors (Basel). 2023 Feb 18;23(4):2286. doi: 10.3390/s23042286.

Abstract

Simultaneous localization and mapping (SLAM) is one of the cornerstones of autonomous navigation systems in robotics and the automotive industry. Visual SLAM (V-SLAM), which relies on image features, such as keypoints and descriptors to estimate the pose transformation between consecutive frames, is a highly efficient and effective approach for gathering environmental information. With the rise of representation learning, feature detectors based on deep neural networks (DNNs) have emerged as an alternative to handcrafted solutions. This work examines the integration of sparse learned features into a state-of-the-art SLAM framework and benchmarks handcrafted and learning-based approaches by comparing the two methods through in-depth experiments. Specifically, we replace the ORB detector and BRIEF descriptor of the ORBSLAM3 pipeline with those provided by Superpoint, a DNN model that jointly computes keypoints and descriptors. Experiments on three publicly available datasets from different application domains were conducted to evaluate the pose estimation performance and resource usage of both solutions.

Keywords: deep learning; learning-based features detectors; simultaneous localization and mapping (SLAM); vision-based pose estimation.

Grants and funding

Work in part supported by the University of Perugia, Fondi di Ricerca di Ateneo 2021, Project “AIDMIX—Artificial Intelligence for Decision Making: Methods for Interpretability and eXplainability”.