Privacy protection of medical data in social network

Jie Su; Yi Cao; Yuehui Chen; Yahui Liu; Jinming Song

doi:10.1186/s12911-021-01645-0

Privacy protection of medical data in social network

BMC Med Inform Decis Mak. 2021 Oct 18;21(Suppl 1):286. doi: 10.1186/s12911-021-01645-0.

Authors

Jie Su^{1

2}, Yi Cao^{3

4}, Yuehui Chen^{3

4}, Yahui Liu⁵, Jinming Song⁶

Affiliations

¹ School of Information Science and Engineering, University of Jinan, Jinan, 250022, China. ise_suj@ujn.edu.cn.
² Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, 250022, China. ise_suj@ujn.edu.cn.
³ School of Information Science and Engineering, University of Jinan, Jinan, 250022, China.
⁴ Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, 250022, China.
⁵ School of Information Management, Beijing Information Science & Technology University, Beijing, China.
⁶ Department of Hematopathology and Lab Medicines, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, 33612, USA.

Abstract

Background: Protection of privacy data published in the health care field is an important research field. The Health Insurance Portability and Accountability Act (HIPAA) in the USA is the current legislation for privacy protection. However, the Institute of Medicine Committee on Health Research and the Privacy of Health Information recently concluded that HIPAA cannot adequately safeguard the privacy, while at the same time researchers cannot use the medical data for effective researches. Therefore, more effective privacy protection methods are urgently needed to ensure the security of released medical data.

Methods: Privacy protection methods based on clustering are the methods and algorithms to ensure that the published data remains useful and protected. In this paper, we first analyzed the importance of the key attributes of medical data in the social network. According to the attribute function and the main objective of privacy protection, the attribute information was divided into three categories. We then proposed an algorithm based on greedy clustering to group the data points according to the attributes and the connective information of the nodes in the published social network. Finally, we analyzed the loss of information during the procedure of clustering, and evaluated the proposed approach with respect to classification accuracy and information loss rates on a medical dataset.

Results: The associated social network of a medical dataset was analyzed for privacy preservation. We evaluated the values of generalization loss and structure loss for different values of k and a, i.e. [Formula: see text] = {3, 6, 9, 12, 15, 18, 21, 24, 27, 30}, a = {0, 0.2, 0.4, 0.6, 0.8, 1}. The experimental results in our proposed approach showed that the generalization loss approached optimal when a = 1 and k = 21, and structure loss approached optimal when a = 0.4 and k = 3.

Conclusion: We showed the importance of the attributes and the structure of the released health data in privacy preservation. Our method achieved better results of privacy preservation in social network by optimizing generalization loss and structure loss. The proposed method to evaluate loss obtained a balance between the data availability and the risk of privacy leakage.

Keywords: Cluster; K-anonymity; Medical data; Privacy protection.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Cluster Analysis
Confidentiality
Health Insurance Portability and Accountability Act*
Humans
Privacy*
Social Networking
United States