Scoring and summarising gene product clusters using the Gene Ontology

Int J Data Min Bioinform. 2008;2(3):216-35. doi: 10.1504/ijdmb.2008.020523.

Abstract

We propose an approach for quantifying the biological relatedness between gene products, based on their properties, and measure their similarities using exclusively statistical NLP techniques and Gene Ontology (GO) annotations. We also present a novel similarity figure of merit, based on the vector space model, which assesses gene expression analysis results and scores gene product clusters' biological coherency, making sole use of their annotation terms and textual descriptions. We define query profiles which rapidly detect a gene product cluster's dominant biological properties. Experimental results validate our approach, and illustrate a strong correlation between our coherency score and gene expression patterns.

MeSH terms

  • Database Management Systems*
  • Databases, Protein*
  • Gene Expression Profiling / methods*
  • Information Storage and Retrieval / methods*
  • Multigene Family / physiology*
  • Proteome / classification*
  • Proteome / metabolism*

Substances

  • Proteome