Objective: The All of Us Research Program makes individual-level data available to researchers while protecting the participants' privacy. This article describes the protections embedded in the multistep access process, with a particular focus on how the data was transformed to meet generally accepted re-identification risk levels.
Methods: At the time of the study, the resource consisted of 329 084 participants. Systematic amendments were applied to the data to mitigate re-identification risk (eg, generalization of geographic regions, suppression of public events, and randomization of dates). We computed the re-identification risk for each participant using a state-of-the-art adversarial model specifically assuming that it is known that someone is a participant in the program. We confirmed the expected risk is no greater than 0.09, a threshold that is consistent with guidelines from various US state and federal agencies. We further investigated how risk varied as a function of participant demographics.
Results: The results indicated that 95th percentile of the re-identification risk of all the participants is below current thresholds. At the same time, we observed that risk levels were higher for certain race, ethnic, and genders.
Conclusions: While the re-identification risk was sufficiently low, this does not imply that the system is devoid of risk. Rather, All of Us uses a multipronged data protection strategy that includes strong authentication practices, active monitoring of data misuse, and penalization mechanisms for users who violate terms of service.
Keywords: All of Us Research Program; data privacy; data sharing; electronic health records.
© The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.