AttOmics: attention-based architecture for diagnosis and prognosis from omics data

Aurélien Beaude; Milad Rafiee Vahid; Franck Augé; Farida Zehraoui; Blaise Hanczar

doi:10.1093/bioinformatics/btad232

AttOmics: attention-based architecture for diagnosis and prognosis from omics data

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i94-i102. doi: 10.1093/bioinformatics/btad232.

Authors

Aurélien Beaude^{1

2}, Milad Rafiee Vahid³, Franck Augé², Farida Zehraoui¹, Blaise Hanczar¹

Affiliations

¹ IBISC, Université Paris-Saclay, Univ Evry, 23 Boulevard de France, Evry-Courcouronnes 91020, France.
² Artificial Intelligence & Deep Analytics, Omics Data Science, Sanofi R&D Data and Data Science, 1 Av. Pierre Brossolette, Chilly-Mazarin 91385, France.
³ Sanofi R&D Data and Data Science, Artificial Intelligence & Deep Analytics, Omics Data Science, 450 Water Street, Cambridge, MA 02142, United States.

Abstract

Motivation: The increasing availability of high-throughput omics data allows for considering a new medicine centered on individual patients. Precision medicine relies on exploiting these high-throughput data with machine-learning models, especially the ones based on deep-learning approaches, to improve diagnosis. Due to the high-dimensional small-sample nature of omics data, current deep-learning models end up with many parameters and have to be fitted with a limited training set. Furthermore, interactions between molecular entities inside an omics profile are not patient specific but are the same for all patients.

Results: In this article, we propose AttOmics, a new deep-learning architecture based on the self-attention mechanism. First, we decompose each omics profile into a set of groups, where each group contains related features. Then, by applying the self-attention mechanism to the set of groups, we can capture the different interactions specific to a patient. The results of different experiments carried out in this article show that our model can accurately predict the phenotype of a patient with fewer parameters than deep neural networks. Visualizing the attention maps can provide new insights into the essential groups for a particular phenotype.

Availability and implementation: The code and data are available at https://forge.ibisc.univ-evry.fr/abeaude/AttOmics. TCGA data can be downloaded from the Genomic Data Commons Data Portal.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Machine Learning*
Neural Networks, Computer*
Phenotype
Precision Medicine