Looking Beyond the Simple Scenarios: Combining Learners and Optimizers in 3D Temporal Tracking

IEEE Trans Vis Comput Graph. 2017 Nov;23(11):2399-2409. doi: 10.1109/TVCG.2017.2734539. Epub 2017 Aug 10.

Abstract

3D object temporal trackers estimate the 3D rotation and 3D translation of a rigid object by propagating the transformation from one frame to the next. To confront this task, algorithms either learn the transformation between two consecutive frames or optimize an energy function to align the object to the scene. The motivation behind our approach stems from a consideration on the nature of learners and optimizers. Throughout the evaluation of different types of objects and working conditions, we observe their complementary nature - on one hand, learners are more robust when undergoing challenging scenarios, while optimizers are prone to tracking failures due to the entrapment at local minima; on the other, optimizers can converge to a better accuracy and minimize jitter. Therefore, we propose to bridge the gap between learners and optimizers to attain a robust and accurate RGB-D temporal tracker that runs at approximately 2 ms per frame using one CPU core. Our work is highly suitable for Augmented Reality (AR), Mixed Reality (MR) and Virtual Reality (VR) applications due to its robustness, accuracy, efficiency and low latency. Aiming at stepping beyond the simple scenarios used by current systems, often constrained by having a single object in the absence of clutter, averting to touch the object to prevent close-range partial occlusion or selecting brightly colored objects to easily segment them individually, we demonstrate the capacity to handle challenging cases under clutter, partial occlusion and varying lighting conditions.