Effect of classification procedure on the performance of numerically defined ecological regions

Environ Manage. 2010 May;45(5):939-52. doi: 10.1007/s00267-010-9465-7. Epub 2010 Mar 19.

Abstract

Ecological regionalizations define geographic regions exhibiting relative homogeneity in ecological (i.e., environmental and biotic) characteristics. Multivariate clustering methods have been used to define ecological regions based on subjectively chosen environmental variables. We developed and tested three procedures for defining ecological regions based on spatial modeling of a multivariate target pattern that is represented by compositional dissimilarities between locations (e.g., taxonomic dissimilarities). The procedures use a "training dataset" representing the target pattern and models this as a function of environmental variables. The model is then extrapolated to the entire domain of interest. Environmental data for our analysis were drawn from a 400 m grid covering all of Switzerland and consisted of 12 variables describing climate, topography and lithology. Our target patterns comprised land cover composition of each grid cell that was derived from interpretation of aerial photographs. For Regionalization 1 we used conventional cluster analysis of the environmental variables to define 60 hierarchically organized levels comprising from 5 to 300 regions. Regionalization 1 provided a base-case for comparison with the model-based regionalizations. Regionalization 2, 3 and 4 also comprised 60 hierarchically organized levels and were derived by modeling land cover composition for 4000 randomly selected "training" cells. Regionalization 2 was based on cluster analysis of environmental variables that were transformed based on a Generalized Dissimilarity Model (GDM). Regionalization 3 and 4 were defined by clustering the training cells based on their land cover composition followed by predictive modeling of the distribution of the land cover clusters using Classification and Regression Tree (CART) and Random Forest (RF) models. Independent test data (i.e. not used to train the models) were used to test the discrimination of land cover composition at all hierarchical levels of the regionalizations using the classification strength (CS) statistic. CS for all the model-based regionalizations was significantly higher than for Regionalization 1. Regionalization 3 and 4 performed significantly better than Regionalization 2 at finer hierarchical levels (many regions) and Regionalization 4 performed significantly better than Regionalization 3 for coarse levels of detail (few regions). Compositional modeling can significantly increase the performance of numerically defined ecological regionalizations. CART and RF-based models appear to produce stronger regionalizations because discriminating variables are able to change at each hierarchic level.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Climate
  • Cluster Analysis
  • Conservation of Natural Resources / methods*
  • Conservation of Natural Resources / statistics & numerical data
  • Ecological and Environmental Phenomena
  • Ecology / classification*
  • Ecology / statistics & numerical data
  • Geography
  • Models, Theoretical*
  • Switzerland