Jointly Learning Deep Features, Deformable Parts, Occlusion and Classification for Pedestrian Detection

IEEE Trans Pattern Anal Mach Intell. 2018 Aug;40(8):1874-1887. doi: 10.1109/TPAMI.2017.2738645. Epub 2017 Aug 11.

Abstract

Feature extraction, deformation handling, occlusion handling, and classification are four important components in pedestrian detection. Existing methods learn or design these components either individually or sequentially. The interaction among these components is not yet well explored. This paper proposes that they should be jointly learned in order to maximize their strengths through cooperation. We formulate these four components into a joint deep learning framework and propose a new deep network architecture (Code available on www.ee.cuhk.edu.hk/wlouyang/projects/ouyangWiccv13Joint/index.html). By establishing automatic, mutual interaction among components, the deep model has average miss rate 8.57 percent/11.71 percent on the Caltech benchmark dataset with new/original annotations.

Publication types

  • Research Support, Non-U.S. Gov't