A novel optimized repeatedly random undersampling for selecting negative samples: A case study in an SVM-based forest fire susceptibility assessment

J Environ Manage. 2020 Oct 1:271:111014. doi: 10.1016/j.jenvman.2020.111014. Epub 2020 Jul 2.

Abstract

The negative sample selection method is a key issue in studies of using machine learning approaches to spatially assess natural hazards. Recently, a Repeatedly Random Undersampling (RRU) was proposed to address the randomness problem faced in Single Random Sampling. However, the RRU cannot guarantee that the generated classifier has the best classification performance during the repeatedly random sampling process. To address this weakness, in this study we proposed an optimized RRU, which follows the idea of RRU, and then changing its rule to find a best classifier. Then, the selected classifier, the actual most accurate classifier (MAC), was employed to compute the probability of hazard occurrence. Support Vector Machine (SVM) was selected as the analysis method, and Genetic Algorithm was employed to compute the parameters of SVM. Forest fire susceptibility was assessed in Huichang County in China due to its forest values and frequent fire events. The results indicated that compared with the RRU, the optimized RRU can find out an actual MAC which has the best classification performance among possible MACs; also, the fire susceptibility map generated by the actual MAC comforts to objective facts. The generated fire susceptibility map can provide useful decision supports for local government to reduce forest fire risks. Moreover, the proposed sampling method, the optimized RRU, presented an enhanced approach for selecting negative samples, which makes the results of forest fire susceptibility assessment more reliable and accurate.

Keywords: Actual MAC; Forest fire susceptibility; Machine learning; Optimized RRU.

MeSH terms

  • Algorithms
  • China
  • Machine Learning
  • Support Vector Machine*
  • Wildfires*