Species distribution models for invasive Eurasian watermilfoil highlight the importance of data quality and limitations of discrimination accuracy metrics

Ecol Evol. 2021 Aug 13;11(18):12567-12582. doi: 10.1002/ece3.8002. eCollection 2021 Sep.

Abstract

Aim: Availability of uniformly collected presence, absence, and abundance data remains a key challenge in species distribution modeling (SDM). For invasive species, abundance and impacts are highly variable across landscapes, and quality occurrence and abundance data are critical for predicting locations at high risk for invasion and impacts, respectively. We leverage a large aquatic vegetation dataset comprising point-level survey data that includes information on the invasive plant Myriophyllum spicatum (Eurasian watermilfoil) to: (a) develop SDMs to predict invasion and impact from environmental variables based on presence-absence, presence-only, and abundance data, and (b) compare evaluation metrics based on functional and discrimination accuracy for presence-absence and presence-only SDMs.

Location: Minnesota, USA.

Methods: Eurasian watermilfoil presence-absence and abundance information were gathered from 468 surveyed lakes, and 801 unsurveyed lakes were leveraged as pseudoabsences for presence-only models. A Random Forest algorithm was used to model the distribution and abundance of Eurasian watermilfoil as a function of lake-specific predictors, both with and without a spatial autocovariate. Occurrence-based SDMs were evaluated using conventional discrimination accuracy metrics and functional accuracy metrics assessing correlation between predicted suitability and observed abundance.

Results: Water temperature degree days and maximum lake depth were two leading predictors influencing both invasion risk and abundance, but they were relatively less important for predicting abundance than other water quality measures. Road density was a strong predictor of Eurasian watermilfoil invasion risk but not abundance. Model evaluations highlighted significant differences: Presence-absence models had high functional accuracy despite low discrimination accuracy, whereas presence-only models showed the opposite pattern.

Main conclusion: Complementing presence-absence data with abundance information offers a richer understanding of invasive Eurasian watermilfoil's ecological niche and enables evaluation of the model's functional accuracy. Conventional discrimination accuracy measures were misleading when models were developed using pseudoabsences. We thus caution against the overuse of presence-only models and suggest directing more effort toward systematic monitoring programs that yield high-quality data.

Keywords: abundance–suitability relationship; discrimination accuracy; functional accuracy; invasion risk; pseudoabsences; random forest; spatial autocovariate; water temperature.