Xrare: a machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis

Genet Med. 2019 Sep;21(9):2126-2134. doi: 10.1038/s41436-019-0439-8. Epub 2019 Jan 24.

Abstract

Purpose: Despite the successful progress next-generation sequencing technologies has achieved in diagnosing the genetic cause of rare Mendelian diseases, the current diagnostic rate is still far from satisfactory because of heterogeneity, imprecision, and noise in disease phenotype descriptions and insufficient utilization of expert knowledge in clinical genetics. To overcome these difficulties, we present a novel method called Xrare for the prioritization of causative gene variants in rare disease diagnosis.

Methods: We propose a new phenotype similarity scoring method called Emission-Reception Information Content (ERIC), which is highly tolerant of noise and imprecision in clinical phenotypes. We utilize medical genetic domain knowledge by designing genetic features implementing American College of Medical Genetics and Genomics (ACMG) guidelines.

Results: ERIC score ranked consistently higher for disease genes than other phenotypic similarity scores in the presence of imprecise and noisy phenotypes. Extensive simulations and real clinical data demonstrated that Xrare outperforms existing alternative methods by 10-40% at various genetic diagnosis scenarios.

Conclusion: The Xrare model is learned from a large database of clinical variants, and derives its strength from the tight integration of medical genetics features and phenotypic features similarity scores. Xrare provides the clinical community with a robust and powerful tool for variant prioritization.

Keywords: ACMG/AMP guideline; machine learning; phenotype score; rare disease diagnosis; variant prioritization.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology
  • Exome / genetics
  • Genetic Testing
  • Genetic Variation / genetics
  • Genomics / methods*
  • Genotype
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Machine Learning*
  • Mutation
  • Phenotype
  • Rare Diseases / diagnosis*
  • Rare Diseases / genetics
  • Software*