Incremental data integration for tracking genotype-disease associations

PLoS Comput Biol. 2020 Jan 27;16(1):e1007586. doi: 10.1371/journal.pcbi.1007586. eCollection 2020 Jan.

Abstract

Functional annotation of genes remains a challenge in fundamental biology and is a limiting factor for translational medicine. Computational approaches have been developed to process heterogeneous data into meaningful metrics, but often do not address how findings might be updated when new evidence comes to light. To address this challenge, we describe requirements for a framework for incremental data integration and propose an implementation based on phenotype ontologies and Bayesian probability updates. We apply the framework to quantify similarities between gene annotations and disease profiles. Within this scope, we categorize human diseases according to how well they can be recapitulated by animal models and quantify similarities between human diseases and mouse models produced by the International Mouse Phenotyping Consortium. The flexibility of the approach allows us to incorporate negative phenotypic data to better prioritize candidate genes, and to stratify disease mapping using sex-dependent phenotypes. All our association scores can be updated and we exploit this feature to showcase integration with curated annotations from high-precision assays. Incremental integration is thus a suitable framework for tracking functional annotations and linking to complex human pathology.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Animals
  • Computational Biology / methods*
  • Disease Models, Animal
  • Genetic Predisposition to Disease*
  • Genotype*
  • Humans
  • Mice
  • Molecular Sequence Annotation
  • Phenotype*