Multi-modal Learning with Missing Data for Cancer Diagnosis Using Histopathological and Genomic Data

Can Cui; Zuhayr Asad; William F Dean; Isabelle T Smith; Christopher Madden; Shunxing Bao; Bennett A Landman; Joseph T Roland; Lori A Coburn; Keith T Wilson; Jeffrey P Zwerner; Shilin Zhao; Lee E Wheless; Yuankai Huo

doi:10.1117/12.2612318

Multi-modal Learning with Missing Data for Cancer Diagnosis Using Histopathological and Genomic Data

Proc SPIE Int Soc Opt Eng. 2022 Feb-Mar:12033:120331D. doi: 10.1117/12.2612318. Epub 2022 Apr 4.

Authors

Can Cui¹, Zuhayr Asad¹, William F Dean², Isabelle T Smith², Christopher Madden³, Shunxing Bao⁴, Bennett A Landman^{4

1}, Joseph T Roland⁵, Lori A Coburn⁶, Keith T Wilson⁶, Jeffrey P Zwerner⁷, Shilin Zhao⁸, Lee E Wheless⁷, Yuankai Huo^{1

4}

Affiliations

¹ Department of Computer Science, Vanderbilt University, Nashville, TN 37235, USA.
² College of Arts and Science, Vanderbilt University, Nashville, TN 37235, USA.
³ College of Medicine, SUNY Downstate Health Science University, Brooklyn, NY 11203, USA.
⁴ Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN 37235, USA.
⁵ Department of Surgery, Vanderbilt University Medical Center, Nashville, TN 37215, USA.
⁶ Division of Gastroenterology Hepatology, and Nutrition, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37215, USA.
⁷ Department of Dermatology, Vanderbilt University Medical Center, Nashville, TN 37215, USA.
⁸ Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37215, USA.

Abstract

Multi-modal learning (e.g., integrating pathological images with genomic features) tends to improve the accuracy of cancer diagnosis and prognosis as compared to learning with a single modality. However, missing data is a common problem in clinical practice, i.e., not every patient has all modalities available. Most of the previous works directly discarded samples with missing modalities, which might lose information in these data and increase the likelihood of overfitting. In this work, we generalize the multi-modal learning in cancer diagnosis with the capacity of dealing with missing data using histological images and genomic data. Our integrated model can utilize all available data from patients with both complete and partial modalities. The experiments on the public TCGA-GBM and TCGA-LGG datasets show that the data with missing modalities can contribute to multi-modal learning, which improves the model performance in grade classification of glioma cancer.

Keywords: Multi-modal learning; deep learning; missing data.

Abstract

Grants and funding