Climate data clustering effects on arid and semi-arid rainfed wheat yield: a comparison of artificial intelligence and K-means approaches

Int J Biometeorol. 2019 Jul;63(7):861-872. doi: 10.1007/s00484-019-01699-w. Epub 2019 May 22.

Abstract

Clustering algorithms are critical data mining techniques used to analyze a wide range of data. This study compares the utility of ant colony optimization (ACO), genetic algorithm (GA), and K-means methods to cluster climatic variables affecting the yield of rainfed wheat in northeast Iran from 1984 to 2010 (27 years). These variables included sunshine hours, wind speed, relative humidity, precipitation, maximum temperature, minimum temperature, and the number of wet days. Seven climatic factors with higher correlations with detrended rainfed wheat yield were selected based on Pearson correlation coefficient significance (P value < 0.1). Three variables (i.e., sunshine hours, wind, and average relative humidity) were excluded for clustering. In the next step based on Pearson correlation (P value < 0.05) between the yield, and the seven climate attributes, fitness function, and silhouette index, only four attributes with higher correlation in its cluster were selected for reclustering. Four climate attributes had an extensive association with yield, so we used four-dimensional clustering to describe the common characteristics of low-, medium-, and high-yielding years, and this is the significance of this research that we have done four-dimensional clustering. The silhouette index showed that the best number of clusters for each station was equal to three clusters. At the last step, reclustering was done through the best-selected method. The results yielded that GA was the best method.

Keywords: Ant colony; Attribute; Fitness function; Genetic algorithm; Rainfed wheat; Silhouette index.

MeSH terms

  • Artificial Intelligence*
  • Cluster Analysis
  • Iran
  • Temperature
  • Triticum*