Deep Scene Flow Learning: From 2D Images to 3D Point Clouds

IEEE Trans Pattern Anal Mach Intell. 2024 Jan;46(1):185-208. doi: 10.1109/TPAMI.2023.3319448. Epub 2023 Dec 5.

Abstract

Scene flow describes the 3D motion in a scene. It can be modeled as a single task or as a composite of the auxiliary tasks of depth, camera motion, and optical flow estimation. Deep learning's emergence in recent years has broadened the horizons for new methodologies in estimating these tasks, either as separate tasks or as joint tasks to reconstruct the scene flow. The sequence of images that are either synthesized or captured by a camera is used as input for these methods, which face the challenge of dealing with various situations in images to provide the most accurate motion, such as image quality. Nowadays, images have been superseded by point clouds, which provide 3D information, thereby expediting and enhancing the estimated motion. In this paper, we dig deeply into scene flow estimation in the deep learning era. We provide a comprehensive overview of the important topics regarding both image-based and point-cloud-based methods. In addition, we cover the methodologies for each category, highlighting the network architecture. Furthermore, we provide a comparison between these methods in terms of performance and efficiency. Finally, we conclude this survey with insights and discussions on the open issues and future research directions.