Object class segmentation of RGB-D video using recurrent convolutional neural networks

Mircea Serban Pavel; Hannes Schulz; Sven Behnke

doi:10.1016/j.neunet.2017.01.003

Object class segmentation of RGB-D video using recurrent convolutional neural networks

Neural Netw. 2017 Apr:88:105-113. doi: 10.1016/j.neunet.2017.01.003. Epub 2017 Jan 30.

Authors

Mircea Serban Pavel¹, Hannes Schulz², Sven Behnke³

Affiliations

¹ Universität Bonn, Computer Science Institute VI, Friedrich-Ebert-Allee 144, 53113 Bonn, Germany. Electronic address: pavel@cs.uni-bonn.de.
² Universität Bonn, Computer Science Institute VI, Friedrich-Ebert-Allee 144, 53113 Bonn, Germany. Electronic address: schulzh@ais.uni-bonn.de.
³ Universität Bonn, Computer Science Institute VI, Friedrich-Ebert-Allee 144, 53113 Bonn, Germany. Electronic address: behnke@cs.uni-bonn.de.

PMID: 28232260
DOI: 10.1016/j.neunet.2017.01.003

Abstract

Object class segmentation is a computer vision task which requires labeling each pixel of an image with the class of the object it belongs to. Deep convolutional neural networks (DNN) are able to learn and take advantage of local spatial correlations required for this task. They are, however, restricted by their small, fixed-sized filters, which limits their ability to learn long-range dependencies. Recurrent Neural Networks (RNN), on the other hand, do not suffer from this restriction. Their iterative interpretation allows them to model long-range dependencies by propagating activity. This property is especially useful when labeling video sequences, where both spatial and temporal long-range dependencies occur. In this work, a novel RNN architecture for object class segmentation is presented. We investigate several ways to train such a network. We evaluate our models on the challenging NYU Depth v2 dataset for object class segmentation and obtain competitive results.

Keywords: Computer vision; Object class-segmentation; Recurrent neural networks.

MeSH terms

Artificial Intelligence*
Humans
Neural Networks, Computer*
Pattern Recognition, Automated / methods*
Video Recording / methods*