Semantic Similarity in the Gene Ontology

Methods Mol Biol. 2017:1446:161-173. doi: 10.1007/978-1-4939-3743-1_12.

Abstract

Gene Ontology-based semantic similarity (SS) allows the comparison of GO terms or entities annotated with GO terms, by leveraging on the ontology structure and properties and on annotation corpora. In the last decade the number and diversity of SS measures based on GO has grown considerably, and their application ranges from functional coherence evaluation, protein interaction prediction, and disease gene prioritization.Understanding how SS measures work, what issues can affect their performance and how they compare to each other in different evaluation settings is crucial to gain a comprehensive view of this area and choose the most appropriate approaches for a given application.In this chapter, we provide a guide to understanding and selecting SS measures for biomedical researchers. We present a straightforward categorization of SS measures and describe the main strategies they employ. We discuss the intrinsic and external issues that affect their performance, and how these can be addressed. We summarize comparative assessment studies, highlighting the top measures in different settings, and compare different implementation strategies and their use. Finally, we discuss some of the extant challenges and opportunities, namely the increased semantic complexity of GO and the need for fast and efficient computation, pointing the way towards the future generation of SS measures.

Keywords: Functional similarity; Gene ontology; Protein similarity; Semantic similarity.

MeSH terms

  • Animals
  • Computational Biology / methods
  • Gene Ontology*
  • Humans
  • Molecular Sequence Annotation / methods
  • Proteins / genetics
  • Semantics

Substances

  • Proteins