Laplacian normalization and random walk on heterogeneous networks for disease-gene prioritization

Comput Biol Chem. 2015 Aug:57:21-8. doi: 10.1016/j.compbiolchem.2015.02.008. Epub 2015 Feb 7.

Abstract

Random walk on heterogeneous networks is a recently emerging approach to effective disease gene prioritization. Laplacian normalization is a technique capable of normalizing the weight of edges in a network. We use this technique to normalize the gene matrix and the phenotype matrix before the construction of the heterogeneous network, and also use this idea to define the transition matrices of the heterogeneous network. Our method has remarkably better performance than the existing methods for recovering known gene-phenotype relationships. The Shannon information entropy of the distribution of the transition probabilities in our networks is found to be smaller than the networks constructed by the existing methods, implying that a higher number of top-ranked genes can be verified as disease genes. In fact, the most probable gene-phenotype relationships ranked within top 3 or top 5 in our gene lists can be confirmed by the OMIM database for many cases. Our algorithms have shown remarkably superior performance over the state-of-the-art algorithms for recovering gene-phenotype relationships. All Matlab codes can be available upon email request.

Keywords: Disease genes and phenotypes; Heterogeneous network; Laplacian normalization; Leave-one-out cross-validation; Random walk with restart; Shannon information entropy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology*
  • Disease / genetics*
  • Entropy
  • Gene Regulatory Networks*
  • Humans
  • Phenotype