Multi-modal Learning with Missing Data for Cancer Diagnosis Using Histopathological and Genomic Data

Proc SPIE Int Soc Opt Eng. 2022 Feb-Mar:12033:120331D. doi: 10.1117/12.2612318. Epub 2022 Apr 4.

Abstract

Multi-modal learning (e.g., integrating pathological images with genomic features) tends to improve the accuracy of cancer diagnosis and prognosis as compared to learning with a single modality. However, missing data is a common problem in clinical practice, i.e., not every patient has all modalities available. Most of the previous works directly discarded samples with missing modalities, which might lose information in these data and increase the likelihood of overfitting. In this work, we generalize the multi-modal learning in cancer diagnosis with the capacity of dealing with missing data using histological images and genomic data. Our integrated model can utilize all available data from patients with both complete and partial modalities. The experiments on the public TCGA-GBM and TCGA-LGG datasets show that the data with missing modalities can contribute to multi-modal learning, which improves the model performance in grade classification of glioma cancer.

Keywords: Multi-modal learning; deep learning; missing data.