How to make use of unlabeled observations in species distribution modeling using point process models

Emy Guilbault; Ian Renner; Michael Mahony; Eric Beh

doi:10.1002/ece3.7411

How to make use of unlabeled observations in species distribution modeling using point process models

Ecol Evol. 2021 Apr 1;11(10):5220-5243. doi: 10.1002/ece3.7411. eCollection 2021 May.

Authors

Emy Guilbault¹, Ian Renner¹, Michael Mahony², Eric Beh¹

Affiliations

¹ Faculty of Science School of Mathematical and Physical Sciences The University of Newcastle Callaghan NSW Australia.
² Faculty of Science School of Environmental and Life Sciences The University of Newcastle Callaghan NSW Australia.

Abstract

Species distribution modeling, which allows users to predict the spatial distribution of species with the use of environmental covariates, has become increasingly popular, with many software platforms providing tools to fit such models. However, the species observations used can have varying levels of quality and can have incomplete information, such as uncertain or unknown species identity.In this paper, we develop two algorithms to classify observations with unknown species identities which simultaneously predict several species distributions using spatial point processes. Through simulations, we compare the performance of these algorithms using 7 different initializations to the performance of models fitted using only the observations with known species identity.We show that performance varies with differences in correlation among species distributions, species abundance, and the proportion of observations with unknown species identities. Additionally, some of the methods developed here outperformed the models that did not use the misspecified data. We applied the best-performing methods to a dataset of three frog species (Mixophyes).These models represent a helpful and promising tool for opportunistic surveys where misidentification is possible or for the distribution of species newly separated in their taxonomy.

Keywords: EM algorithm; classification; ecological statistics; machine learning; misidentification; mixture modeling; presence‐only data.

Associated data

Dryad/10.5061/dryad.vx0k6djqw