A heterogeneous multi-modal medical data fusion framework supporting hybrid data exploration

Health Inf Sci Syst. 2022 Aug 26;10(1):22. doi: 10.1007/s13755-022-00183-x. eCollection 2022 Dec.

Abstract

Industry 4.0 era has witnessed that more and more high-tech and precise devices are applied into medical field to provide better services. Besides EMRs, medical data include a large amount of unstructured data such as X-rays, MRI scans, CT scans and PET scans, which is still continually increasing. These massive, heterogeneous multi-modal data bring the big challenge to finding valuable data sets for healthcare researchers and other users. The traditional data warehouses are able to integrate the data and support interactive data exploration through ETL process. However, they have high cost and are not real-time. Furthermore, they lack of the ability to deal with multi-modal data in two phases-data fusion and data exploration. In the data fusion phase, it is difficult to unify the multi-modal data under one data model. In the data exploration phase, it is challenging to explore the multi-modal data at the same time, which impedes the process of extracting the diverse information underlying multi-modal data. Therefore, in order to solve these problems, we propose a highly efficient data fusion framework supporting data exploration for heterogeneous multi-modal medical data based on data lake. This framework provides a novel and efficient method to fuse the fragmented multi-modal medical data and store their metadata in the data lake. It offers a user-friendly interface supporting hybrid graph queries to explore multi-modal data. Indexes are created to accelerate the hybrid data exploration. One prototype has been implemented and tested in a hospital, which demonstrates the effectiveness of our framework.

Keywords: Data fusion; Heterogeneous data; Hybrid data exploration; Multi-modal medical data.