Anytime Recognition with Routing Convolutional Networks

IEEE Trans Pattern Anal Mach Intell. 2021 Jun;43(6):1875-1886. doi: 10.1109/TPAMI.2019.2959322. Epub 2021 May 11.

Abstract

Achieving an automatic trade-off between accuracy and efficiency for a single deep neural network is highly desired in time-sensitive computer vision applications. To achieve anytime prediction, existing methods only embed fixed exits to neural networks and make the predictions with the fixed exits for all the samples (refer to the "latest-all" strategy). However, it is observed that the latest exit within a time budget does not always provide a more accurate prediction than the earlier exits for testing samples of various difficulties, making the "latest-all" strategy a sub-optimal solution. Motivated by this, we propose to improve the anytime prediction accuracy by allowing each sample to adaptively select its own optimal exit within a specific time budget. Specifically, we propose a new Routing Convolutional Network (RCN). For any given time budget, it adaptively selects the optimal layer as exit for a specific testing sample. To learn an optimal policy for sample routing, a Q-network is embedded into the RCN at each exit, considering both potential information gain and time-cost. To further boost the anytime prediction accuracy, the exits and the Q-networks are optimized alternately to mutually boost each other under the cost-sensitive environment. Apart from applying to whole image classification, RCN can also be adapted to dense prediction tasks, e.g., scene parsing, to achieve the pixel-level anytime prediction. Extensive experimental results on CIFAR-10, CIFAR-100, and ImageNet classification benchmarks, and Cityscapes scene parsing benchmark demonstrate the efficacy of the proposed RCN for anytime recognition.