Analysis of Unstructured Text-Based Data Using Machine Learning Techniques: The Case of Pediatric Emergency Department Records in Nicaragua

Giulia Lorenzoni; Silvia Bressan; Corrado Lanera; Danila Azzolina; Liviana Da Dalt; Dario Gregori

doi:10.1177/1077558719844123

Analysis of Unstructured Text-Based Data Using Machine Learning Techniques: The Case of Pediatric Emergency Department Records in Nicaragua

Med Care Res Rev. 2021 Apr;78(2):138-145. doi: 10.1177/1077558719844123. Epub 2019 Apr 29.

Authors

Giulia Lorenzoni¹, Silvia Bressan², Corrado Lanera¹, Danila Azzolina¹, Liviana Da Dalt², Dario Gregori¹

Affiliations

¹ Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padova, Italy.
² Division of Pediatric Emergency Medicine, Department of Women's and Children's Health, University of Padova, Padova, Italy.

PMID: 31030615
DOI: 10.1177/1077558719844123

Abstract

Free-text information is still widely used in emergency department (ED) records. Machine learning techniques are useful for analyzing narratives, but they have been used mostly for English-language data sets. Considering such a framework, the performance of an ML classification task of a Spanish-language ED visits database was tested. ED visits collected in the EDs of nine hospitals in Nicaragua were analyzed. Spanish-language, free-text discharge diagnoses were considered in the analysis. Five-hundred random forests were trained on a set of bootstrap samples of the whole data set (1,789 ED visits) to perform the classification task. For each one, after having identified optimal parameter value, the final validated model was trained on the whole bootstrapped data set and tested. The classification accuracies had a median of 0.783 (95% CI [0.779, 0.796]). Machine learning techniques seemed to be a promising opportunity for the exploitation of unstructured information reported in ED records in low- and middle-income Spanish-speaking countries.

Keywords: Spanish; classification task; emergency department visits; free-text discharge diagnosis; low- and middle-income countries; random forest.

MeSH terms

Child
Databases, Factual
Emergency Service, Hospital*
Humans
Machine Learning*
Nicaragua
Patient Discharge