Deep Neural Network Driven Speech Classification for Relevance Detection in Automatic Medical Documentation

Suhail Ahamed; Gabriele Weiler; Karl Boden; Kai Januschowski; Matthias Stennes; Patrick McCrae; Cornelia Bock; Carina Rawein; Marco Petris; Kilian Foth; Kerstin Rohm; Stephan Kiefer

doi:10.3233/SHTI210121

Deep Neural Network Driven Speech Classification for Relevance Detection in Automatic Medical Documentation

Stud Health Technol Inform. 2021 May 27:281:63-67. doi: 10.3233/SHTI210121.

Authors

Suhail Ahamed¹, Gabriele Weiler¹, Karl Boden^{2

3}, Kai Januschowski^{2

3}, Matthias Stennes⁴, Patrick McCrae⁵, Cornelia Bock⁵, Carina Rawein⁵, Marco Petris⁵, Kilian Foth⁵, Kerstin Rohm¹, Stephan Kiefer¹

Affiliations

¹ Fraunhofer Institute for Biomedical Engineering, Sulzbach, Germany.
² Klaus Heimann Eye Research Institute (KHERI), Sulzbach, Germany.
³ Eye Clinic Sulzbach, Knappschaftsklinikum Saar, Sulzbach, Germany.
⁴ Fraunhofer Institute for Digital Media Technology, Oldenburg, Germany.
⁵ LangTec, Hamburg, Germany.

PMID: 34042706
DOI: 10.3233/SHTI210121

Abstract

The automation of medical documentation is a highly desirable process, especially as it could avert significant temporal and monetary expenses in healthcare. With the help of complex modelling and high computational capability, Automatic Speech Recognition (ASR) and deep learning have made several promising attempts to this end. However, a factor that significantly determines the efficiency of these systems is the volume of speech that is processed in each medical examination. In the course of this study, we found that over half of the speech, recorded during follow-up examinations of patients treated with Intra-Vitreal Injections, was not relevant for medical documentation. In this paper, we evaluate the application of Convolutional and Long Short-Term Memory (LSTM) neural networks for the development of a speech classification module aimed at identifying speech relevant for medical report generation. In this regard, various topology parameters are tested and the effect of the model performance on different speaker attributes is analyzed. The results indicate that Convolutional Neural Networks (CNNs) are more successful than LSTM networks, and achieve a validation accuracy of 92.41%. Furthermore, on evaluation of the robustness of the model to gender, accent and unknown speakers, the neural network generalized satisfactorily.

Keywords: Automatic Speech Recognition; Medical documentation; Neural Networks; Optical Coherence Tomography; Report generation.

MeSH terms

Automation
Documentation
Humans
Neural Networks, Computer*
Speech*