Unsupervised Multi-Latent Space RL Framework for Video Summarization in Ultrasound Imaging

IEEE J Biomed Health Inform. 2023 Jan;27(1):227-238. doi: 10.1109/JBHI.2022.3208779. Epub 2023 Jan 4.

Abstract

The COVID-19 pandemic has highlighted the need for a tool to speed up triage in ultrasound scans and provide clinicians with fast access to relevant information. To this end, we propose a new unsupervised reinforcement learning (RL) framework with novel rewards to facilitate unsupervised learning by avoiding tedious and impractical manual labelling for summarizing ultrasound videos. The proposed framework is capable of delivering video summaries with classification labels and segmentations of key landmarks which enhances its utility as a triage tool in the emergency department (ED) and for use in telemedicine. Using an attention ensemble of encoders, the high dimensional image is projected into a low dimensional latent space in terms of: a) reduced distance with a normal or abnormal class (classifier encoder), b) following a topology of landmarks (segmentation encoder), and c) the distance or topology agnostic latent representation (autoencoders). The summarization network is implemented using a bi-directional long short term memory (Bi-LSTM) which utilizes the latent space representation from the encoder. Validation is performed on lung ultrasound (LUS), that typically represent potential use cases in telemedicine and ED triage acquired from different medical centers across geographies (India and Spain). The proposed approach trained and tested on 126 LUS videos showed high agreement with the ground truth with an average precision of over 80% and average F1 score of well over 44 ±1.7 %. The approach resulted in an average reduction in storage space of 77% which can ease bandwidth and storage requirements in telemedicine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19*
  • Humans
  • India
  • Lung / diagnostic imaging
  • Pandemics
  • Ultrasonography