A scoping review on multimodal deep learning in biomedical images and texts

Zhaoyi Sun; Mingquan Lin; Qingqing Zhu; Qianqian Xie; Fei Wang; Zhiyong Lu; Yifan Peng

doi:10.1016/j.jbi.2023.104482

A scoping review on multimodal deep learning in biomedical images and texts

J Biomed Inform. 2023 Oct:146:104482. doi: 10.1016/j.jbi.2023.104482. Epub 2023 Aug 29.

Authors

Zhaoyi Sun¹, Mingquan Lin², Qingqing Zhu³, Qianqian Xie⁴, Fei Wang⁵, Zhiyong Lu⁶, Yifan Peng⁷

Affiliations

¹ Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA. Electronic address: zhs4003@med.cornell.edu.
² Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA. Electronic address: mil4012@med.cornell.edu.
³ National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA. Electronic address: qingqing.zhu@nih.gov.
⁴ Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA. Electronic address: qix4002@med.cornell.edu.
⁵ Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA. Electronic address: few2001@med.cornell.edu.
⁶ National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA. Electronic address: luzh@ncbi.nlm.nih.gov.Yifan.
⁷ Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA. Electronic address: yip4002@med.cornell.edu.

PMID: 37652343
PMCID: PMC10591890 (available on 2024-10-01)
DOI: 10.1016/j.jbi.2023.104482

Abstract

Objective: Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions.

Methods: In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research.

Result: This study reviewed the current uses of multimodal deep learning on five tasks: (1) Report generation, (2) Visual question answering, (3) Cross-modal retrieval, (4) Computer-aided diagnosis, and (5) Semantic segmentation.

Conclusion: Our results highlight the diverse applications and potential of MDL and suggest directions for future research in the field. We hope our review will facilitate the collaboration of natural language processing (NLP) and medical imaging communities and support the next generation of decision-making and computer-assisted diagnostic system development.

Keywords: Clinical notes; Medical images; Multimodal learning; Scoping review.

Publication types

Systematic Review
Review
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Deep Learning*
Diagnosis, Computer-Assisted
Diagnostic Imaging
Natural Language Processing
Semantics

Grants and funding

R00 LM013001/LM/NLM NIH HHS/United States