Multiomics Topic Modeling for Breast Cancer Classification

Cancers (Basel). 2022 Feb 23;14(5):1150. doi: 10.3390/cancers14051150.

Abstract

The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of 'omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or "topics" that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability.

Keywords: chr14q32; miRNA expression regulation; miRNAs; multiomics; stochastic block modeling; topic modeling.