A partially supervised classification approach to dominant and recessive human disease gene prediction

Comput Methods Programs Biomed. 2007 Mar;85(3):229-37. doi: 10.1016/j.cmpb.2006.12.003. Epub 2007 Jan 26.

Abstract

The discovery of the genes involved in genetic diseases is a very important step towards the understanding of the nature of these diseases. In-lab identification is a difficult, time-consuming task, where computational methods can be very useful. In silico identification algorithms can be used as a guide in future studies. Previous works in this topic have not taken into account that no reliable sets of negative examples are available, as it is not possible to ensure that a given gene is not related to any genetic disease. In this paper, this feature of the nature of the problem is considered, and identification is approached as a partially supervised classification problem. In addition, we have performed a more specific method to identify disease genes by classifying, for the first time, genes causing dominant and recessive diseases independently. We base this separation on previous results that show that these two types of genes present differences in their sequence properties. In this paper, we have applied a new model averaging algorithm to the identification of human genes associated with both dominant and recessive Mendelian diseases.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genes, Dominant / genetics*
  • Genes, Recessive / genetics*
  • Genetic Predisposition to Disease / classification*
  • Humans
  • Sequence Analysis, DNA
  • Spain