Ontological representation, classification and data-driven computing of phenotypes

J Biomed Semantics. 2020 Dec 21;11(1):15. doi: 10.1186/s13326-020-00230-0.

Abstract

Background: The successful determination and analysis of phenotypes plays a key role in the diagnostic process, the evaluation of risk factors and the recruitment of participants for clinical and epidemiological studies. The development of computable phenotype algorithms to solve these tasks is a challenging problem, caused by various reasons. Firstly, the term 'phenotype' has no generally agreed definition and its meaning depends on context. Secondly, the phenotypes are most commonly specified as non-computable descriptive documents. Recent attempts have shown that ontologies are a suitable way to handle phenotypes and that they can support clinical research and decision making. The SMITH Consortium is dedicated to rapidly establish an integrative medical informatics framework to provide physicians with the best available data and knowledge and enable innovative use of healthcare data for research and treatment optimisation. In the context of a methodological use case 'phenotype pipeline' (PheP), a technology to automatically generate phenotype classifications and annotations based on electronic health records (EHR) is developed. A large series of phenotype algorithms will be implemented. This implies that for each algorithm a classification scheme and its input variables have to be defined. Furthermore, a phenotype engine is required to evaluate and execute developed algorithms.

Results: In this article, we present a Core Ontology of Phenotypes (COP) and the software Phenotype Manager (PhenoMan), which implements a novel ontology-based method to model, classify and compute phenotypes from already available data. Our solution includes an enhanced iterative reasoning process combining classification tasks with mathematical calculations at runtime. The ontology as well as the reasoning method were successfully evaluated with selected phenotypes including SOFA score, socio-economic status, body surface area and WHO BMI classification based on available medical data.

Conclusions: We developed a novel ontology-based method to model phenotypes of living beings with the aim of automated phenotype reasoning based on available data. This new approach can be used in clinical context, e.g., for supporting the diagnostic process, evaluating risk factors, and recruiting appropriate participants for clinical and epidemiological studies.

Keywords: Phenotype calculation; Phenotype classification; Phenotype definition; Phenotype ontology; Phenotype reasoning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Biological Ontologies*
  • Humans
  • Medical Informatics / methods
  • Medical Informatics / statistics & numerical data*
  • Medical Records Systems, Computerized / statistics & numerical data*
  • Models, Theoretical
  • Phenotype
  • Semantics*