De-Identification of Medical Narrative Data

Stud Health Technol Inform. 2017:244:23-27.

Abstract

Maintaining data security and privacy in an era of cybersecurity is a challenge. The enormous and rapidly growing amount of health-related data available today raises numerous questions about data collection, storage, analysis, comparability and interoperability but also about data protection. The US Health Portability and Accountability Act (HIPAA) of 1996 provides a legal framework and a guidance for using and disclosing health data. Practically, the approach proposed by HIPAA is the de-identification of medical documents by removing certain Protected Health Information (PHI). In this work, a rule-based method for the de-identification of French free-text medical data using Natural Language Processing (NLP) tools will be presented.

Keywords: HIPAA; Medical data; Natural Language Processing (NLP); anonymization; data protection; de-identification; privacy.

MeSH terms

  • Computer Security*
  • Confidentiality
  • Data Anonymization*
  • Health Insurance Portability and Accountability Act*
  • Natural Language Processing
  • United States