Enhancing georeferenced biodiversity inventories: automated information extraction from literature records reveal the gaps

PeerJ. 2022 Aug 18:10:e13921. doi: 10.7717/peerj.13921. eCollection 2022.

Abstract

We use natural language processing (NLP) to retrieve location data for cheilostome bryozoan species (text-mined occurrences (TMO)) in an automated procedure. We compare these results with data combined from two major public databases (DB): the Ocean Biodiversity Information System (OBIS), and the Global Biodiversity Information Facility (GBIF). Using DB and TMO data separately and in combination, we present latitudinal species richness curves using standard estimators (Chao2 and the Jackknife) and range-through approaches. Our combined DB and TMO species richness curves quantitatively document a bimodal global latitudinal diversity gradient for extant cheilostomes for the first time, with peaks in the temperate zones. A total of 79% of the georeferenced species we retrieved from TMO (N = 1,408) and DB (N = 4,549) are non-overlapping. Despite clear indications that global location data compiled for cheilostomes should be improved with concerted effort, our study supports the view that many marine latitudinal species richness patterns deviate from the canonical latitudinal diversity gradient (LDG). Moreover, combining online biodiversity databases with automated information retrieval from the published literature is a promising avenue for expanding taxon-location datasets.

Keywords: Bimodality; Bryozoa; Geographic distribution; Latitudinal diversity gradient (LDG); Marine invertebrates; Natural langauge processing (NLP); Public data repositories; Species richness; Text-mining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Biodiversity*
  • Bryozoa*
  • Information Storage and Retrieval

Grants and funding

This project is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 724324 to Lee Hsiang Liow). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.