Hypothesis-free phenotype prediction within a genetics-first framework

Nat Commun. 2023 Feb 17;14(1):919. doi: 10.1038/s41467-023-36634-6.

Abstract

Cohort-wide sequencing studies have revealed that the largest category of variants is those deemed 'rare', even for the subset located in coding regions (99% of known coding variants are seen in less than 1% of the population. Associative methods give some understanding how rare genetic variants influence disease and organism-level phenotypes. But here we show that additional discoveries can be made through a knowledge-based approach using protein domains and ontologies (function and phenotype) that considers all coding variants regardless of allele frequency. We describe an ab initio, genetics-first method making molecular knowledge-based interpretations for exome-wide non-synonymous variants for phenotypes at the organism and cellular level. By using this reverse approach, we identify plausible genetic causes for developmental disorders that have eluded other established methods and present molecular hypotheses for the causal genetics of 40 phenotypes generated from a direct-to-consumer genotype cohort. This system offers a chance to extract further discovery from genetic data after standard tools have been applied.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Exome*
  • Gene Frequency
  • Genetic Predisposition to Disease*
  • Genotype
  • Humans
  • Phenotype