Disease genes prediction by HMM based PU-learning using gene expression profiles

J Biomed Inform. 2018 May:81:102-111. doi: 10.1016/j.jbi.2018.03.006. Epub 2018 Mar 20.

Abstract

Predicting disease candidate genes from human genome is a crucial part of nowadays biomedical research. According to observations, diseases with the same phenotype have the similar biological characteristics and genes associated with these same diseases tend to share common functional properties. Therefore, by applying machine learning methods, new disease genes are predicted based on previous ones. In recent studies, some semi-supervised learning methods, called Positive-Unlabeled Learning (PU-Learning) are used for predicting disease candidate genes. In this study, a novel method is introduced to predict disease candidate genes through gene expression profiles by learning hidden Markov models. In order to evaluate the proposed method, it is applied on a mixed part of 398 disease genes from three disease types and 12001 unlabeled genes. Compared to the other methods in literature, the experimental results indicate a significant improvement in favor of the proposed method.

Keywords: Disease gene prediction; Gene expression profile; Hidden Markov model; Positive-unlabeled learning.

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Cluster Analysis
  • Computational Biology / methods*
  • Gene Expression Profiling*
  • Genetic Predisposition to Disease*
  • Humans
  • Markov Chains*
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis
  • Phenotype
  • Probability
  • Protein Interaction Mapping*
  • Software
  • Supervised Machine Learning
  • Transcriptome