Influence of the go-based semantic similarity measures in multi-objective gene clustering algorithm performance

Jorge Parraga-Alava; Mario Inostroza-Ponta

doi:10.1142/S0219720020500389

Influence of the go-based semantic similarity measures in multi-objective gene clustering algorithm performance

J Bioinform Comput Biol. 2020 Dec;18(6):2050038. doi: 10.1142/S0219720020500389. Epub 2020 Nov 5.

Authors

Jorge Parraga-Alava¹, Mario Inostroza-Ponta²

Affiliations

¹ Facultad de Ciencias Informáticas, Universidad Técnica de Manabí, Avenida José María Urbina, Portoviejo 130105, Ecuador.
² Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Avenida Libertador General Bernardo O'Higgins, Santiago 9170020, Chile.

PMID: 33148094
DOI: 10.1142/S0219720020500389

Abstract

Using a prior biological knowledge of relationships and genetic functions for gene similarity, from repository such as the Gene Ontology (GO), has shown good results in multi-objective gene clustering algorithms. In this scenario and to obtain useful clustering results, it would be helpful to know which measure of biological similarity between genes should be employed to yield meaningful clusters that have both similar expression patterns (co-expression) and biological homogeneity. In this paper, we studied the influence of the four most used GO-based semantic similarity measures in the performance of a multi-objective gene clustering algorithm. We used four publicly available datasets and carried out comparative studies based on performance metrics for the multi-objective optimization field and clustering performance indexes. In most of the cases, using Jiang-Conrath and Wang similarities stand in terms of multi-objective metrics. In clustering properties, Resnik similarity allows to achieve the best values of compactness and separation and therefore of co-expression of groups of genes. Meanwhile, in biological homogeneity, the Wang similarity reports greater number of significant GO terms. However, statistical, visual, and biological significance tests showed that none of the GO-based semantic similarity measures stand out above the rest in order to significantly improve the performance of the multi-objective gene clustering algorithm.

Keywords: Gene ontology (GO); genetic algorithm; multi-objective clustering; semantic similarity measures.

MeSH terms

Algorithms*
Cluster Analysis
Computational Biology
Databases, Genetic / statistics & numerical data
Gene Ontology / statistics & numerical data
Multigene Family*
Semantics
Transcriptome