Pseudoabsence generation strategies for species distribution models

PLoS One. 2012;7(8):e44486. doi: 10.1371/journal.pone.0044486. Epub 2012 Aug 31.

Abstract

Background: Species distribution models require selection of species, study extent and spatial unit, statistical methods, variables, and assessment metrics. If absence data are not available, another important consideration is pseudoabsence generation. Different strategies for pseudoabsence generation can produce varying spatial representation of species.

Methodology: We considered model outcomes from four different strategies for generating pseudoabsences. We generating pseudoabsences randomly by 1) selection from the entire study extent, 2) a two-step process of selection first from the entire study extent, followed by selection for pseudoabsences from areas with predicted probability <25%, 3) selection from plots surveyed without detection of species presence, 4) a two-step process of selection first for pseudoabsences from plots surveyed without detection of species presence, followed by selection for pseudoabsences from the areas with predicted probability <25%. We used Random Forests as our statistical method and sixteen predictor variables to model tree species with at least 150 records from Forest Inventory and Analysis surveys in the Laurentian Mixed Forest province of Minnesota.

Conclusions: Pseudoabsence generation strategy completely affected the area predicted as present for species distribution models and may be one of the most influential determinants of models. All the pseudoabsence strategies produced mean AUC values of at least 0.87. More importantly than accuracy metrics, the two-step strategies over-predicted species presence, due to too much environmental distance between the pseudoabsences and recorded presences, whereas models based on random pseudoabsences under-predicted species presence, due to too little environmental distance between the pseudoabsences and recorded presences. Models using pseudoabsences from surveyed plots produced a balance between areas with high and low predicted probabilities and the strongest relationship between density and area with predicted probabilities ≥75%. Because of imperfect accuracy assessment, the best assessment currently may be evaluation of whether the species has been sufficiently but not excessively predicted to occur.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Area Under Curve
  • Data Collection
  • Geography
  • Minnesota
  • Models, Biological*
  • Species Specificity
  • Trees / physiology*

Grants and funding

Funding was provided by the National Fire Plan of the USDA Forest Service, Northern Research Station. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.