Predicting the deforestation probability using the binary logistic regression, random forest, ensemble rotational forest, REPTree: A case study at the Gumani River Basin, India

Sci Total Environ. 2020 Aug 15:730:139197. doi: 10.1016/j.scitotenv.2020.139197. Epub 2020 May 4.

Abstract

Rapid population growth and its corresponding effects like the expansion of human settlement, increasing agricultural land, and industry lead to the loss of forest area in most parts of the world especially in such highly populated nations like India. Forest canopy density (FCD) is a useful measure to assess the forest cover change in its own as numerous works of forest change have been done using only FCD with the help of remote sensing and GIS. The coupling of binary logistic regression (BLR), random forest (RF), ensemble of rotational forest and reduced error pruning trees (RTF-REPTree) with FCD makes it more convenient to find out the deforestation probability. Advanced vegetation index (AVI), bare soil index (BSI), shadow index (SI), and scaled vegetation density (VD) derived from Landsat imageries are the main input parameters to identify the FCD. After preparing the FCDs of 1990, 2000, 2010 and 2017 the deforestation map of the study area was prepared and considered as dependent parameter for deforestation probability modelling. On the other hand, twelve deforestation determining factors were used to delineate the deforestation probability with the help of BLR, RF and RTF-REPTree models. These deforestation probability models were validated through area under curve (AUC), receiver operating characteristics (ROC), efficiency, true skill statistics (TSS) and Kappa co-efficient. The validation result shows that all the models like BLR (AUC = 0.874), RF (AUC = 0.886) and RTF-REPTree (AUC = 0.919) have good capability of assessing the deforestation probability but among them, RTF-REPTree has the highest accuracy level. The result also shows that low canopy density area i.e. not under the dense forest cover has increased by 9.26% from 1990 to 2017. Besides, nearly 30% of the forested land is under high to very high deforestation probable zone, which needs to be protected with immediate measures.

Keywords: Deforestation; Ensemble model; Forest canopy density; Machine learning algorithms; Probabilistic model.