Machine learning approaches in GIS-based ecological modeling of the sand fly Phlebotomus papatasi, a vector of zoonotic cutaneous leishmaniasis in Golestan province, Iran

Acta Trop. 2018 Dec:188:187-194. doi: 10.1016/j.actatropica.2018.09.004. Epub 2018 Sep 7.

Abstract

The distribution and abundance of Phlebotomus papatasi, the primary vector of zoonotic cutaneous leishmaniasis in most semi-/arid countries, is a major public health challenge. This study compares several approaches to model the spatial distribution of the species in an endemic region of the disease in Golestan province, northeast of Iran. The intent is to assist decision makers for targeted interventions. We developed a geo-database of the collected Phlebotominae sand flies from different parts of the study region. Sticky paper traps coated with castor oil were used to collect sand flies. In 44 out of 142 sampling sites, Ph. papatasi was present. We also gathered and prepared data on related environmental factors including topography, weather variables, distance to main rivers and remotely sensed data such as normalized difference vegetation cover and land surface temperature (LST) in a GIS framework. Applicability of three classifiers: (vanilla) logistic regression, random forest and support vector machine (SVM) were compared for predicting presence/absence of the vector. Predictive performances were compared using an independent dataset to generate area under the ROC curve (AUC) and Kappa statistics. All three models successfully predicted the presence/absence of the vector, however, the SVM classifier (Accuracy = 0.906, AUC = 0.974, Kappa = 0.876) outperformed the other classifiers on predicting accuracy. Moreover, this classifier was the most sensitive (85%), and the most specific (93%) model. Sensitivity analysis of the most accurate model (i.e. SVM) revealed that slope, nighttime LST in October and mean temperature of the wettest quarter were among the most important predictors. The findings suggest that machine learning techniques, especially the SVM classifier, when coupled with GIS and remote sensing data can be a useful and cost-effective way for identifying habitat suitability of the species.

Keywords: Accuracy assessment; Ecological modeling; GIS; Support vector machine; Zoonotic cutaneous leishmaniasis.

MeSH terms

  • Animals
  • Area Under Curve
  • Ecosystem
  • Environment
  • Geographic Information Systems
  • Insect Vectors
  • Iran / epidemiology
  • Leishmaniasis, Cutaneous / epidemiology
  • Leishmaniasis, Cutaneous / transmission*
  • Machine Learning*
  • Phlebotomus*