Mining rare associations between biological ontologies

PLoS One. 2014 Jan 3;9(1):e84475. doi: 10.1371/journal.pone.0084475. eCollection 2014.

Abstract

The constantly increasing volume and complexity of available biological data requires new methods for their management and analysis. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining cross and intra-ontology pairwise generalized association rules. Its advantage is sensitivity to rare associations, for these are important for biologists. We propose a new class of interestingness measures designed for hierarchically organized rules. These measures allow one to select the most important rules and to take into account rare cases. They favor rules with an actual interestingness value that exceeds the expected value. The latter is calculated taking into account the parent rule. We demonstrate this approach by applying it to the analysis of data from Gene Ontology and GPCR databases. Our objective is to discover interesting relations between two different ontologies or parts of a single ontology. The association rules that are thus discovered can provide the user with new knowledge about underlying biological processes or help improve annotation consistency. The obtained results show that produced rules represent meaningful and quite reliable associations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Ontologies*
  • Data Mining*
  • Humans

Grants and funding

The project is supported by the German Research Foundation (Deutsche Forschungsgemeinschaft (DFG)). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.