Influence of the go-based semantic similarity measures in multi-objective gene clustering algorithm performance

J Bioinform Comput Biol. 2020 Dec;18(6):2050038. doi: 10.1142/S0219720020500389. Epub 2020 Nov 5.

Abstract

Using a prior biological knowledge of relationships and genetic functions for gene similarity, from repository such as the Gene Ontology (GO), has shown good results in multi-objective gene clustering algorithms. In this scenario and to obtain useful clustering results, it would be helpful to know which measure of biological similarity between genes should be employed to yield meaningful clusters that have both similar expression patterns (co-expression) and biological homogeneity. In this paper, we studied the influence of the four most used GO-based semantic similarity measures in the performance of a multi-objective gene clustering algorithm. We used four publicly available datasets and carried out comparative studies based on performance metrics for the multi-objective optimization field and clustering performance indexes. In most of the cases, using Jiang-Conrath and Wang similarities stand in terms of multi-objective metrics. In clustering properties, Resnik similarity allows to achieve the best values of compactness and separation and therefore of co-expression of groups of genes. Meanwhile, in biological homogeneity, the Wang similarity reports greater number of significant GO terms. However, statistical, visual, and biological significance tests showed that none of the GO-based semantic similarity measures stand out above the rest in order to significantly improve the performance of the multi-objective gene clustering algorithm.

Keywords: Gene ontology (GO); genetic algorithm; multi-objective clustering; semantic similarity measures.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Computational Biology
  • Databases, Genetic / statistics & numerical data
  • Gene Ontology / statistics & numerical data
  • Multigene Family*
  • Semantics
  • Transcriptome