A network-based machine-learning framework to identify both functional modules and disease genes

Hum Genet. 2021 Jun;140(6):897-913. doi: 10.1007/s00439-020-02253-0. Epub 2021 Jan 7.

Abstract

Disease gene identification is a critical step towards uncovering the molecular mechanisms of diseases and systematically investigating complex disease phenotypes. Despite considerable efforts to develop powerful computing methods, candidate gene identification remains a severe challenge owing to the connectivity of an incomplete interactome network, which hampers the discovery of true novel candidate genes. We developed a network-based machine-learning framework to identify both functional modules and disease candidate genes. In this framework, we designed a semi-supervised non-negative matrix factorization model to obtain the functional modules related to the diseases and genes. Of note, we proposed a disease gene-prioritizing method called MapGene that integrates the correlations from both functional modules and network closeness. Our framework identified a set of functional modules with highly functional homogeneity and close gene interactions. Experiments on a large-scale benchmark dataset showed that MapGene performs significantly better than the state-of-the-art algorithms. Further analysis demonstrates MapGene can effectively relieve the impact of the incompleteness of interactome networks and obtain highly reliable rankings of candidate genes. In addition, disease cases on Parkinson's disease and diabetes mellitus confirmed the generalization of MapGene for novel candidate gene identification. This work proposed, for the first time, an integrated computing framework to predict both functional modules and disease candidate genes. The methodology and results support that our framework has the potential to help discover underlying functional modules and reliable candidate genes in human disease.

MeSH terms

  • Amino Acid Sequence
  • Computational Biology / methods
  • Gastrointestinal Diseases / diagnosis
  • Gastrointestinal Diseases / genetics
  • Gastrointestinal Diseases / pathology
  • Gene Regulatory Networks*
  • Humans
  • Immune System Diseases / diagnosis
  • Immune System Diseases / genetics
  • Immune System Diseases / pathology
  • Mental Disorders / diagnosis
  • Mental Disorders / genetics
  • Mental Disorders / pathology
  • Metabolic Diseases / diagnosis
  • Metabolic Diseases / genetics
  • Metabolic Diseases / pathology
  • Metabolic Networks and Pathways / genetics*
  • Musculoskeletal Diseases / diagnosis
  • Musculoskeletal Diseases / genetics
  • Musculoskeletal Diseases / pathology
  • Neoplasms / diagnosis
  • Neoplasms / genetics
  • Neoplasms / pathology
  • Neurodegenerative Diseases / diagnosis
  • Neurodegenerative Diseases / genetics
  • Neurodegenerative Diseases / pathology
  • Predictive Value of Tests*
  • Protein Interaction Mapping
  • Supervised Machine Learning*
  • Terminology as Topic