The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records

Michela Assale; Linda Greta Dui; Andrea Cina; Andrea Seveso; Federico Cabitza

doi:10.3389/fmed.2019.00066

The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records

Front Med (Lausanne). 2019 Apr 17:6:66. doi: 10.3389/fmed.2019.00066. eCollection 2019.

Authors

Michela Assale^{1

2}, Linda Greta Dui^{3

4}, Andrea Cina^{1

2}, Andrea Seveso^{2

4}, Federico Cabitza^{2

5}

Affiliations

¹ K-tree SRL, Pont-Saint-Martin, Italy.
² University of Milano-Bicocca, Milan, Italy.
³ Politecnico di Milano, Milan, Italy.
⁴ Link-Up Datareg, Cinisello Balsamo, Italy.
⁵ IRCCS Istituto Ortopedico Galeazzi, Milan, Italy.

Abstract

Problem: Clinical practice requires the production of a time- and resource-consuming great amount of notes. They contain relevant information, but their secondary use is almost impossible, due to their unstructured nature. Researchers are trying to address this problems, with traditional and promising novel techniques. Application in real hospital settings seems not to be possible yet, though, both because of relatively small and dirty dataset, and for the lack of language-specific pre-trained models. Aim: Our aim is to demonstrate the potential of the above techniques, but also raise awareness of the still open challenges that the scientific communities of IT and medical practitioners must jointly address to realize the full potential of unstructured content that is daily produced and digitized in hospital settings, both to improve its data quality and leverage the insights from data-driven predictive models. Methods: To this extent, we present a narrative literature review of the most recent and relevant contributions to leverage the application of Natural Language Processing techniques to the free-text content electronic patient records. In particular, we focused on four selected application domains, namely: data quality, information extraction, sentiment analysis and predictive models, and automated patient cohort selection. Then, we will present a few empirical studies that we undertook at a major teaching hospital specializing in musculoskeletal diseases. Results: We provide the reader with some simple and affordable pipelines, which demonstrate the feasibility of reaching literature performance levels with a single institution non-English dataset. In such a way, we bridged literature and real world needs, performing a step further toward the revival of notes fields.

Keywords: clinical intelligence; data quality; information extraction; literature review; machine learning; natural language processing (NLP); sentiment analysis; text mining.