On the limits of graph neural networks for the early diagnosis of Alzheimer's disease

Laura Hernández-Lorenzo; Markus Hoffmann; Evelyn Scheibling; Markus List; Jordi A Matías-Guiu; Jose L Ayala

doi:10.1038/s41598-022-21491-y

On the limits of graph neural networks for the early diagnosis of Alzheimer's disease

Sci Rep. 2022 Oct 21;12(1):17632. doi: 10.1038/s41598-022-21491-y.

Authors

Laura Hernández-Lorenzo^{1

2

3}, Markus Hoffmann^{4

5}, Evelyn Scheibling⁴, Markus List⁴, Jordi A Matías-Guiu⁶, Jose L Ayala⁷

Affiliations

¹ Department of Computer Architecture and Automation, Computer Science Faculty, Complutense University of Madrid, 28040, Madrid, Spain. laurahl@ucm.es.
² Department of Neurology, Hospital Clínico San Carlos, San Carlos Research Health Institute (IdISSC), Universidad Complutense, 28040, Madrid, Spain. laurahl@ucm.es.
³ Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany. laurahl@ucm.es.
⁴ Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany.
⁵ Institute for Advanced Study, Technical University of Munich, Lichtenbergstrasse 2 a, 85748, Garching, Germany.
⁶ Department of Neurology, Hospital Clínico San Carlos, San Carlos Research Health Institute (IdISSC), Universidad Complutense, 28040, Madrid, Spain.
⁷ Department of Computer Architecture and Automation, Computer Science Faculty, Complutense University of Madrid, 28040, Madrid, Spain.

Abstract

Alzheimer's disease (AD) is a neurodegenerative disease whose molecular mechanisms are activated several years before cognitive symptoms appear. Genotype-based prediction of the phenotype is thus a key challenge for the early diagnosis of AD. Machine learning techniques that have been proposed to address this challenge do not consider known biological interactions between the genes used as input features, thus neglecting important information about the disease mechanisms at play. To mitigate this, we first extracted AD subnetworks from several protein-protein interaction (PPI) databases and labeled these with genotype information (number of missense variants) to make them patient-specific. Next, we trained Graph Neural Networks (GNNs) on the patient-specific networks for phenotype prediction. We tested different PPI databases and compared the performance of the GNN models to baseline models using classical machine learning techniques, as well as randomized networks and input datasets. The overall results showed that GNNs could not outperform a baseline predictor only using the APOE gene, suggesting that missense variants are not sufficient to explain disease risk beyond the APOE status. Nevertheless, our results show that GNNs outperformed other machine learning techniques and that protein-protein interactions lead to superior results compared to randomized networks. These findings highlight that gene interactions are a valuable source of information in predicting disease status.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Alzheimer Disease* / diagnosis
Alzheimer Disease* / genetics
Apolipoproteins E
Early Diagnosis
Humans
Neural Networks, Computer
Neurodegenerative Diseases*

Substances

Apolipoproteins E