Region-Based Convolutional Networks for Accurate Object Detection and Segmentation

Ross Girshick; Jeff Donahue; Trevor Darrell; Jitendra Malik

doi:10.1109/TPAMI.2015.2437384

Region-Based Convolutional Networks for Accurate Object Detection and Segmentation

IEEE Trans Pattern Anal Mach Intell. 2016 Jan;38(1):142-58. doi: 10.1109/TPAMI.2015.2437384.

Authors

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

PMID: 26656583
DOI: 10.1109/TPAMI.2015.2437384

Abstract

Object detection performance, as measured on the canonical PASCAL VOC Challenge datasets, plateaued in the final years of the competition. The best-performing methods were complex ensemble systems that typically combined multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent. Our approach combines two ideas: (1) one can apply high-capacity convolutional networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data are scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, boosts performance significantly. Since we combine region proposals with CNNs, we call the resulting model an R-CNN or Region-based Convolutional Network. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.