NECo: A node embedding algorithm for multiplex heterogeneous networks

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2020 Dec:2020:146-149. doi: 10.1109/bibm49941.2020.9313595. Epub 2021 Jan 13.

Abstract

Complex diseases such as hypertension, cancer, and diabetes cause nearly 70% of the deaths in the U.S. and involve multiple genes and their interactions with environmental factors. Therefore, identification of genetic factors to understand and decrease the morbidity and mortality from complex diseases is an important and challenging task. With the generation of an unprecedented amount of multi-omics datasets, network-based methods have become popular to represent the multilayered complex molecular interactions. Particularly node embeddings, the low-dimensional representations of nodes in a network are utilized for gene function prediction. Integrated network analysis of multi-omics data alleviates the issues related to missing data and lack of context-specific datasets. Most of the node embedding methods, however, are unable to integrate multiple types of datasets from genes and phenotypes. To address this limitation, we developed a node embedding algorithm called Node Embeddings of Complex networks (NECo) that can utilize multilayered heterogeneous networks of genes and phenotypes. We evaluated the performance of NECo using genotypic and phenotypic datasets from rat (Rattus norvegicus) disease models to classify hypertension disease-related genes. Our method significantly outperformed the state-of-the-art node embedding methods, with AUC of 94.97% compared 85.98% in the second-best performer, and predicted genes not previously implicated in hypertension.

Keywords: Network integration; complex disease; disease gene prediction; feature learning; genotype to phenotype mapping; graph representation; hypertension; multi-omics data integration; multiplex heterogeneous networks; network propagation; node embedding; random walk with restart; rat.