A supervised weighted similarity measure for gene expressions using biological knowledge

Gene. 2016 Dec 31;595(2):150-160. doi: 10.1016/j.gene.2016.09.033. Epub 2016 Sep 26.

Abstract

A supervised similarity measure for Saccharomyces cerevisiae gene expressions is developed which can capture the gene similarity when multiple types of experimental conditions like cell cycle, heat shock are available for all the genes. The measure is called Weighted Pearson correlation (WPC), where the weights are systematically determined for each type of experiment by maximizing the positive predictive value for gene pairs having Pearson correlation greater than 0.80. The positive predictive value is computed by using the annotation information available from yeast GO-Slim process annotations in Saccharomyces Genome Database (SGD). Genes are then clustered by k-medoid algorithm using the newly computed WPC, and functions of 135 unclassified genes are predicted with a p-value cutoff 10-5 using Munich Information for Protein Sequences (MIPS) annotations. Out of these genes, functional categories of 55 gene are predicted with p-value cutoff greater than 10-10 and reported in this investigation. The superiority of WPC as compared to some existing similarity measures like Pearson correlation and Euclidean distance is demonstrated using positive predictive (PPV) values of gene pairs for different Saccharomyces cerevisiae data sets. The related code is available at http://www.sampa.droppages.com/WPC.html.

Keywords: Bioinformatics; Computational biology; Gene annotation; Gene expression; Pattern recognition; Saccharomyces cerevisiae; Supervised similarity measure.

MeSH terms

  • Algorithms*
  • Databases, Genetic
  • Gene Expression*
  • Genes, Fungal
  • Molecular Sequence Annotation / methods*
  • Reproducibility of Results
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae Proteins / genetics*

Substances

  • Saccharomyces cerevisiae Proteins