Comparison of profile similarity measures for genetic interaction networks

PLoS One. 2013 Jul 10;8(7):e68664. doi: 10.1371/journal.pone.0068664. Print 2013.

Abstract

Analysis of genetic interaction networks often involves identifying genes with similar profiles, which is typically indicative of a common function. While several profile similarity measures have been applied in this context, they have never been systematically benchmarked. We compared a diverse set of correlation measures, including measures commonly used by the genetic interaction community as well as several other candidate measures, by assessing their utility in extracting functional information from genetic interaction data. We find that the dot product, one of the simplest vector operations, outperforms most other measures over a large range of gene pairs. More generally, linear similarity measures such as the dot product, Pearson correlation or cosine similarity perform better than set overlap measures such as Jaccard coefficient. Similarity measures that involve L2-normalization of the profiles tend to perform better for the top-most similar pairs but perform less favorably when a larger set of gene pairs is considered or when the genetic interaction data is thresholded. Such measures are also less robust to the presence of noise and batch effects in the genetic interaction data. Overall, the dot product measure performs consistently among the best measures under a variety of different conditions and genetic interaction datasets.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology
  • Epistasis, Genetic*
  • Gene Expression Regulation, Fungal
  • Gene Regulatory Networks / physiology*
  • Genes, Fungal / physiology
  • High-Throughput Screening Assays
  • Humans
  • Oligonucleotide Array Sequence Analysis
  • Saccharomyces cerevisiae / genetics*
  • Schizosaccharomyces / genetics*
  • Transcriptome*