JOINT AND INDIVIDUAL ANALYSIS OF BREAST CANCER HISTOLOGIC IMAGES AND GENOMIC COVARIATES

Ann Appl Stat. 2021 Dec;15(4):1697-1722. doi: 10.1214/20-aoas1433. Epub 2021 Dec 21.

Abstract

The two main approaches in the study of breast cancer are histopathology (analyzing visual characteristics of tumors) and genomics. While both histopathology and genomics are fundamental to cancer research, the connections between these fields have been relatively superficial. We bridge this gap by investigating the Carolina Breast Cancer Study through the development of an integrative, exploratory analysis framework. Our analysis gives insights - some known, some novel - that are engaging to both pathologists and geneticists. Our analysis framework is based on Angle-based Joint and Individual Variation Explained (AJIVE) for statistical data integration and exploits Convolutional Neural Networks (CNNs) as a powerful, automatic method for image feature extraction. CNNs raise interpretability issues that we address by developing novel methods to explore visual modes of variation captured by statistical algorithms (e.g. PCA or AJIVE) applied to CNN features.

Keywords: Multi-view data; breast cancer histopathology; deep learning; dimensionality reduction; gene expression; image analysis; interpretability.