A segmentation-informed deep learning framework to register dynamic two-dimensional magnetic resonance images of the vocal tract during speech

Matthieu Ruthven; Marc E Miquel; Andrew P King

doi:10.1016/j.bspc.2022.104290

A segmentation-informed deep learning framework to register dynamic two-dimensional magnetic resonance images of the vocal tract during speech

Biomed Signal Process Control. 2023 Feb:80:104290. doi: 10.1016/j.bspc.2022.104290.

Authors

Matthieu Ruthven^{1

2}, Marc E Miquel^{1

3

4}, Andrew P King²

Affiliations

¹ Clinical Physics, Barts Health NHS Trust, West Smithfield, London EC1A 7BE, United Kingdom.
² School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London SE1 7EH, United Kingdom.
³ Digital Environment Research Institute (DERI), Empire House, 67-75 New Road, Queen Mary University of London, London E1 1HH, United Kingdom.
⁴ Advanced Cardiovascular Imaging, Barts NIHR BRC, Queen Mary University of London, London EC1M 6BQ, United Kingdom.

Abstract

Objective: Dynamic magnetic resonance (MR) imaging enables visualisation of articulators during speech. There is growing interest in quantifying articulator motion in two-dimensional MR images of the vocal tract, to better understand speech production and potentially inform patient management decisions. Image registration is an established way to achieve this quantification. Recently, segmentation-informed deformable registration frameworks have been developed and have achieved state-of-the-art accuracy. This work aims to adapt such a framework and optimise it for estimating displacement fields between dynamic two-dimensional MR images of the vocal tract during speech.

Methods: A deep-learning-based registration framework was developed and compared with current state-of-the-art registration methods and frameworks (two traditional methods and three deep-learning-based frameworks, two of which are segmentation informed). The accuracy of the methods and frameworks was evaluated using the Dice coefficient (DSC), average surface distance (ASD) and a metric based on velopharyngeal closure. The metric evaluated if the fields captured a clinically relevant and quantifiable aspect of articulator motion.

Results: The segmentation-informed frameworks achieved higher DSCs and lower ASDs and captured more velopharyngeal closures than the traditional methods and the framework that was not segmentation informed. All segmentation-informed frameworks achieved similar DSCs and ASDs. However, the proposed framework captured the most velopharyngeal closures.

Conclusions: A framework was successfully developed and found to more accurately estimate articulator motion than five current state-of-the-art methods and frameworks.

Significance: The first deep-learning-based framework specifically for registering dynamic two-dimensional MR images of the vocal tract during speech has been developed and evaluated.

Keywords: Articulators; Convolutional neural networks; Dynamic magnetic resonance imaging; Registration; Segmentation; Speech.