De-identification of unstructured paper-based health records for privacy-preserving secondary use

Stefan Fenz; Johannes Heurix; Thomas Neubauer; Antonio Rella

doi:10.3109/03091902.2014.913080

De-identification of unstructured paper-based health records for privacy-preserving secondary use

J Med Eng Technol. 2014 Jul;38(5):260-8. doi: 10.3109/03091902.2014.913080. Epub 2014 May 19.

Authors

Stefan Fenz¹, Johannes Heurix, Thomas Neubauer, Antonio Rella

Affiliation

¹ Vienna University of Technology, Institute of Software Technology and Interactive Systems , Favoritenstrasse 9-11, 1040 Vienna , Austria.

PMID: 24841844
DOI: 10.3109/03091902.2014.913080

Abstract

Abstract Whenever personal data is processed, privacy is a serious issue. Especially in the document-centric e-health area, the patients' privacy must be preserved in order to prevent any negative repercussions for the patient. Clinical research, for example, demands structured health records to carry out efficient clinical trials, whereas legislation (e.g. HIPAA) regulates that only de-identified health records may be used for research. However, unstructured and often paper-based data dominates information technology, especially in the healthcare sector. Existing approaches are geared towards data in English-language documents only and have not been designed to handle the recognition of erroneous personal data which is the result of the OCR-based digitization of paper-based health records.

Keywords: Computer security; health records; named entity recognition; natural language processing; privacy.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence
Confidentiality*
Forms and Records Control / methods
Health Records, Personal*
Image Processing, Computer-Assisted*
Information Storage and Retrieval / methods
Paper
Privacy