Part-Based Obstacle Detection Using a Multiple Output Neural Network

Razvan Itu; Radu Danescu

doi:10.3390/s22124312

Part-Based Obstacle Detection Using a Multiple Output Neural Network

Sensors (Basel). 2022 Jun 7;22(12):4312. doi: 10.3390/s22124312.

Authors

Razvan Itu¹, Radu Danescu¹

Affiliation

¹ Computer Science Department, Technical University of Cluj-Napoca, St. Memorandumului 28, 400114 Cluj-Napoca, Romania.

Abstract

Detecting the objects surrounding a moving vehicle is essential for autonomous driving and for any kind of advanced driving assistance system; such a system can also be used for analyzing the surrounding traffic as the vehicle moves. The most popular techniques for object detection are based on image processing; in recent years, they have become increasingly focused on artificial intelligence. Systems using monocular vision are increasingly popular for driving assistance, as they do not require complex calibration and setup. The lack of three-dimensional data is compensated for by the efficient and accurate classification of the input image pixels. The detected objects are usually identified as cuboids in the 3D space, or as rectangles in the image space. Recently, instance segmentation techniques have been developed that are able to identify the freeform set of pixels that form an individual object, using complex convolutional neural networks (CNNs). This paper presents an alternative to these instance segmentation networks, combining much simpler semantic segmentation networks with light, geometrical post-processing techniques, to achieve instance segmentation results. The semantic segmentation network produces four semantic labels that identify the quarters of the individual objects: top left, top right, bottom left, and bottom right. These pixels are grouped into connected regions, based on their proximity and their position with respect to the whole object. Each quarter is used to generate a complete object hypothesis, which is then scored according to object pixel fitness. The individual homogeneous regions extracted from the labeled pixels are then assigned to the best-fitted rectangles, leading to complete and freeform identification of the pixels of individual objects. The accuracy is similar to instance segmentation-based methods but with reduced complexity in terms of trainable parameters, which leads to a reduced demand for computational resources.

Keywords: CNNs; driver assistance; instance segmentation; monocular vision; obstacle detection; semantic segmentation; vanishing point.

MeSH terms

Artificial Intelligence*
Image Processing, Computer-Assisted / methods
Neural Networks, Computer*
Semantics

Grants and funding

This work was partly supported by grants from the Ministry of Research and Innovation, CNCS—UEFISCDI, project number PN-III-P4-ID-PCE2020-1700 and PN-III-P1-1.1-PD-2021-0247, and partly supported by the project “Entrepreneurial competencies and excellence research in doctoral and postdoctoral programs—ANTREDOC”, which project was co-funded by the European Social Fund, financing agreement no. 56437/24.07.2019.