Training Convolutional Neural Networks withMulti-Size Images and Triplet Loss for RemoteSensing Scene Classification

Sensors (Basel). 2020 Feb 21;20(4):1188. doi: 10.3390/s20041188.

Abstract

Many remote sensing scene classification algorithms improve their classification accuracyby additional modules, which increases the parameters and computing overhead of the model atthe inference stage. In this paper, we explore how to improve the classification accuracy of themodel without adding modules at the inference stage. First, we propose a network trainingstrategy of training with multi-size images. Then, we introduce more supervision information bytriplet loss and design a branch for the triplet loss. In addition, dropout is introduced between thefeature extractor and the classifier to avoid over-fitting. These modules only work at the trainingstage and will not bring about the increase in model parameters at the inference stage. We useResnet18 as the baseline and add the three modules to the baseline. We perform experiments onthree datasets: AID, NWPU-RESISC45, and OPTIMAL. Experimental results show that our modelcombined with the three modules is more competitive than many existing classification algorithms.In addition, ablation experiments on OPTIMAL show that dropout, triplet loss, and training withmulti-size images improve the overall accuracy of the model on the test set by 0.53%, 0.38%, and0.7%, respectively. The combination of the three modules improves the overall accuracy of themodel by 1.61%. It can be seen that the three modules can improve the classification accuracy of themodel without increasing model parameters at the inference stage, and training with multi-sizeimages brings a greater gain in accuracy than the other two modules, but the combination of thethree modules will be better.

Keywords: dropout; remote sensing scene classification; training with multi‐size images; triplet loss.