An Embodied Multi-Sensor Fusion Approach to Visual Motion Estimation Using Unsupervised Deep Networks

E Jared Shamwell; William D Nothwang; Donald Perlis

doi:10.3390/s18051427

An Embodied Multi-Sensor Fusion Approach to Visual Motion Estimation Using Unsupervised Deep Networks

Sensors (Basel). 2018 May 4;18(5):1427. doi: 10.3390/s18051427.

Authors

E Jared Shamwell¹, William D Nothwang², Donald Perlis³

Affiliations

¹ Sensors and Electron Devices Directorate, US Army Research Laboratory, 2800 Powder Mill Rd, Adelphi MD 20783, USA. earl.j.shamwell.ctr@mail.mil.
² Sensors and Electron Devices Directorate, US Army Research Laboratory, 2800 Powder Mill Rd, Adelphi MD 20783, USA. william.d.nothwang.civ@mail.mil.
³ Department of Computer Science, University of Maryland, A.V. Williams Building, College Park, MD 20740, USA. perlis@cs.umd.edu.

Abstract

Aimed at improving size, weight, and power (SWaP)-constrained robotic vision-aided state estimation, we describe our unsupervised, deep convolutional-deconvolutional sensor fusion network, Multi-Hypothesis DeepEfference (MHDE). MHDE learns to intelligently combine noisy heterogeneous sensor data to predict several probable hypotheses for the dense, pixel-level correspondence between a source image and an unseen target image. We show how our multi-hypothesis formulation provides increased robustness against dynamic, heteroscedastic sensor and motion noise by computing hypothesis image mappings and predictions at 76⁻357 Hz depending on the number of hypotheses being generated. MHDE fuses noisy, heterogeneous sensory inputs using two parallel, inter-connected architectural pathways and n (1⁻20 in this work) multi-hypothesis generating sub-pathways to produce n global correspondence estimates between a source and a target image. We evaluated MHDE on the KITTI Odometry dataset and benchmarked it against the vision-only DeepMatching and Deformable Spatial Pyramids algorithms and were able to demonstrate a significant runtime decrease and a performance increase compared to the next-best performing method.

Keywords: deep learning; optical flow; sensor fusion.