MOCAT: multi-omics integration with auxiliary classifiers enhanced autoencoder

Xiaohui Yao; Xiaohan Jiang; Haoran Luo; Hong Liang; Xiufen Ye; Yanhui Wei; Shan Cong

doi:10.1186/s13040-024-00360-6

MOCAT: multi-omics integration with auxiliary classifiers enhanced autoencoder

BioData Min. 2024 Mar 5;17(1):9. doi: 10.1186/s13040-024-00360-6.

Authors

Xiaohui Yao^{1

2}, Xiaohan Jiang¹, Haoran Luo^{1

2}, Hong Liang², Xiufen Ye², Yanhui Wei², Shan Cong^{3

4}

Affiliations

¹ Qingdao Innovation and Development Center, Harbin Engineering University, 1777 Sansha Rd, Qingdao, 266000, Shandong, China.
² College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China.
³ Qingdao Innovation and Development Center, Harbin Engineering University, 1777 Sansha Rd, Qingdao, 266000, Shandong, China. Shan.Cong@hrbeu.edu.cn.
⁴ College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China. Shan.Cong@hrbeu.edu.cn.

Abstract

Background: Integrating multi-omics data is emerging as a critical approach in enhancing our understanding of complex diseases. Innovative computational methods capable of managing high-dimensional and heterogeneous datasets are required to unlock the full potential of such rich and diverse data.

Methods: We propose a Multi-Omics integration framework with auxiliary Classifiers-enhanced AuToencoders (MOCAT) to utilize intra- and inter-omics information comprehensively. Additionally, attention mechanisms with confidence learning are incorporated for enhanced feature representation and trustworthy prediction.

Results: Extensive experiments were conducted on four benchmark datasets to evaluate the effectiveness of our proposed model, including BRCA, ROSMAP, LGG, and KIPAN. Our model significantly improved most evaluation measurements and consistently surpassed the state-of-the-art methods. Ablation studies showed that the auxiliary classifiers significantly boosted classification accuracy in the ROSMAP and LGG datasets. Moreover, the attention mechanisms and confidence evaluation block contributed to improvements in the predictive accuracy and generalizability of our model.

Conclusions: The proposed framework exhibits superior performance in disease classification and biomarker discovery, establishing itself as a robust and versatile tool for analyzing multi-layer biological data. This study highlights the significance of elaborated designed deep learning methodologies in dissecting complex disease phenotypes and improving the accuracy of disease predictions.

Keywords: Attention mechanism; Autoencoder; Auxiliary classifier; Disease prediction; Multi-omics integration; Trustworthy learning.

Abstract

Grants and funding