A geospatial source selector for federated GeoSPARQL querying

Open Res Eur. 2022 Oct 6:2:48. doi: 10.12688/openreseurope.14605.2. eCollection 2022.

Abstract

Background: Geospatial linked data brings into the scope of the Semantic Web and its technologies, a wealth of datasets that combine semantically-rich descriptions of resources with their geo-location. There are, however, various Semantic Web technologies where technical work is needed in order to achieve the full integration of geospatial data, and federated query processing is one of these technologies. Methods: In this paper, we explore the idea of annotating data sources with a bounding polygon that summarizes the spatial extent of the resources in each data source, and of using such a summary as an (additional) source selection criterion in order to reduce the set of sources that will be tested as potentially holding relevant data. We present our source selection method, and we discuss its correctness and implementation. Results: We evaluate the proposed source selection using three different types of summaries with different degrees of accuracy, against not using geospatial summaries. We use datasets and queries from a practical use case that combines crop-type data with water availability data for food security. The experimental results suggest that more complex summaries lead to slower source selection times, but also to more precise exclusion of unneeded sources. Moreover, we observe the source selection runtime is (partially or fully) recovered by shorter planning and execution runtimes. As a result, the federated sources are not burdened by pointless querying from the federation engine. Conclusions: The evaluation draws on data and queries from the agroenvironmental domain and shows that our source selection method substantially improves the effectiveness of federated GeoSPARQL query processing.

Keywords: Federated and distributed query processing; GeoSPARQL query processing; Linked geospatial data; Source selection.

Grants and funding

This research was financially supported by the European Union’s Horizon 2020 research and innovation programme under the grant agreement No 825258 (From Copernicus Big Data to Extreme Earth Analytics [ExtremeEarth]).