Locational privacy-preserving distance computations with intersecting sets of randomly labeled grid points

Int J Health Geogr. 2021 Mar 20;20(1):14. doi: 10.1186/s12942-021-00268-y.

Abstract

Background: We introduce and study a recently proposed method for privacy-preserving distance computations which has received little attention in the scientific literature so far. The method, which is based on intersecting sets of randomly labeled grid points, is henceforth denoted as ISGP allows calculating the approximate distances between masked spatial data. Coordinates are replaced by sets of hash values. The method allows the computation of distances between locations L when the locations at different points in time t are not known simultaneously. The distance between [Formula: see text] and [Formula: see text] could be computed even when [Formula: see text] does not exist at [Formula: see text] and [Formula: see text] has been deleted at [Formula: see text]. An example would be patients from a medical data set and locations of later hospitalizations. ISGP is a new tool for privacy-preserving data handling of geo-referenced data sets in general. Furthermore, this technique can be used to include geographical identifiers as additional information for privacy-preserving record-linkage. To show that the technique can be implemented in most high-level programming languages with a few lines of code, a complete implementation within the statistical programming language R is given. The properties of the method are explored using simulations based on large-scale real-world data of hospitals ([Formula: see text]) and residential locations ([Formula: see text]). The method has already been used in a real-world application.

Results: ISGP yields very accurate results. Our simulation study showed that-with appropriately chosen parameters - 99 % accuracy in the approximated distances is achieved.

Conclusion: We discussed a new method for privacy-preserving distance computations in microdata. The method is highly accurate, fast, has low computational burden, and does not require excessive storage.

Keywords: Geo-masking; Geo-referenced data; Geographical data; ISGP; Record-linkage.

MeSH terms

  • Computer Simulation
  • Computer Systems*
  • Humans
  • Privacy*