'Unmasking' masked address data: A medoid geocoding solution

MethodsX. 2023 Feb 22:10:102090. doi: 10.1016/j.mex.2023.102090. eCollection 2023.

Abstract

In recent years, there has been a consistent push for more open data initiatives, particularly for datasets collected by public agencies or groups that receive public funding. However, there is a tension between the release of open data and the preservation of individual and household privacy, whose balance shifts due to increased data availability, the sophistication of analysis techniques, and the computational power available to users. As a result, data masking is a standard tool used to preserve privacy. This is a process in which the data publishers obfuscate some identifying features in the dataset while attempting to maintain as much accuracy and precision as possible. For spatial datasets, the geocoding of administratively-masked data has been a consistent problem. Here, we present a medoid-based technique that geocodes masked data while minimizing the spatial uncertainty associated with the masking approach. Unfortunately, many commercial geocoding software packages either fail to geocode administratively-masked data or provide false positives by assigning points to city or street centroids. We demonstrate the results of our medoid-based geocoding approach by comparing it to commercial geocoding software. The results suggest that a medoid geocoding approach is mechanically simple to deploy and maximizes the spatial accuracy of the resulting geocodes.•Administratively-masked data are difficult to geocode•A medoid geocoding method maximizes geocoding accuracy•This method outperforms commercial geocoding software.

Keywords: Geocoding; Masked data; Medoid geocoding; Open data; Privacy; Spatial analysis.