Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

Phan Thanh Noi; Martin Kappas

doi:10.3390/s18010018

Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

Sensors (Basel). 2017 Dec 22;18(1):18. doi: 10.3390/s18010018.

Authors

Phan Thanh Noi^{1

2}, Martin Kappas³

Affiliations

¹ Cartography, GIS and Remote Sensing Department, Institute of Geography, University of Göttingen, Goldschmidt Street 5, 37077 Göttingen, Germany. phan.thanh.noi@gmail.com.
² Cartography and Geodesy Department, Land Management Faculty, Vietnam National University of Agriculture, Hanoi 100000, Vietnam. phan.thanh.noi@gmail.com.
³ Cartography, GIS and Remote Sensing Department, Institute of Geography, University of Göttingen, Goldschmidt Street 5, 37077 Göttingen, Germany. mkappas@gwdg.de.

Abstract

In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.

Keywords: Random Forest (RF); Sentinel-2; Support Vector Machine (SVM); classification algorithms; k-Nearest Neighbor (kNN); training sample size.