Cross-Part Learning for Fine-Grained Image Classification

IEEE Trans Image Process. 2022:31:748-758. doi: 10.1109/TIP.2021.3135477. Epub 2021 Dec 28.

Abstract

Recent techniques have achieved remarkable improvements depended on mining subtle yet distinctive features for fine-grained visual classification (FGVC). While prior works directly combine discriminative features extracted from different parts, we argue that the potential interactions between different parts and their abilities to category predictions should be taken into consideration, which enables significant parts to contribute more to the decision of the sub-category. To this end, we present a Cross-Part Convolutional Neural Network (CP-CNN) in a weakly supervised manner to explore cross-learning among multi-regional features. Specifically, the context transformer is implemented to encourage joint feature learning across different parts under the guidance of a navigator. The part with the highest confidence is regarded as a navigator to deliver distinguishing characteristics to the others with lower confidence while the complementary information is retained. To locate discriminative but subtle parts precisely, a part proposal generator (PPG) is designed with the feature enhancement blocks, through which complex scale variations caused by the viewpoint diversity can be effectively alleviated. Extensive experiments on three benchmark datasets demonstrate that our proposed method consistently outperforms existing state-of-the-art methods.