APANet: Auto-Path Aggregation for Future Instance Segmentation Prediction

Jian-Fang Hu; Jiangxin Sun; Zihang Lin; Jian-Huang Lai; Wenjun Zeng; Wei-Shi Zheng

doi:10.1109/TPAMI.2021.3058679

APANet: Auto-Path Aggregation for Future Instance Segmentation Prediction

IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3386-3403. doi: 10.1109/TPAMI.2021.3058679. Epub 2022 Jun 3.

Authors

Jian-Fang Hu, Jiangxin Sun, Zihang Lin, Jian-Huang Lai, Wenjun Zeng, Wei-Shi Zheng

PMID: 33571087
DOI: 10.1109/TPAMI.2021.3058679

Abstract

Despite the remarkable progress achieved in conventional instance segmentation, the problem of predicting instance segmentation results for unobserved future frames remains challenging due to the unobservability of future data. Existing methods mainly address this challenge by forecasting features of future frames. However, these methods always treat features of multiple levels (e.g., coarse-to-fine pyramid features) independently and do not exploit them collaboratively, which results in inaccurate prediction for future frames; and moreover, such a weakness can partially hinder self-adaption of a future segmentation prediction model for different input samples. To solve this problem, we propose an adaptive aggregation approach called Auto-Path Aggregation Network (APANet), where the spatio-temporal contextual information obtained in the features of each individual level is selectively aggregated using the developed "auto-path". The "auto-path" connects each pair of features extracted at different pyramid levels for task-specific hierarchical contextual information aggregation, which enables selective and adaptive aggregation of pyramid features in accordance with different videos/frames. Our APANet can be further optimized jointly with the Mask R-CNN head as a feature decoder and a Feature Pyramid Network (FPN) feature encoder, forming a joint learning system for future instance segmentation prediction. We experimentally show that the proposed method can achieve state-of-the-art performance on three video-based instance segmentation benchmarks for future instance segmentation prediction.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Image Processing, Computer-Assisted* / methods
Learning
Neural Networks, Computer*