Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes

BMC Bioinformatics. 2023 Jul 21;24(1):294. doi: 10.1186/s12859-023-05406-w.

Abstract

Background: Identifying variants associated with diseases is a challenging task in medical genetics research. Current studies that prioritize variants within individual genomes generally rely on known variants, evidence from literature and genomes, and patient symptoms and clinical signs. The functionalities of the existing tools, which rank variants based on given patient symptoms and clinical signs, are restricted to the coverage of ontologies such as the Human Phenotype Ontology (HPO). However, most clinicians do not limit themselves to HPO while describing patient symptoms/signs and their associated variants/genes. There is thus a need for an automated tool that can prioritize variants based on freely expressed patient symptoms and clinical signs.

Results: STARVar is a Symptom-based Tool for Automatic Ranking of Variants using evidence from literature and genomes. STARVar uses patient symptoms and clinical signs, either linked to HPO or expressed in free text format. It returns a ranked list of variants based on a combined score from two classifiers utilizing evidence from genomics and literature. STARVar improves over related tools on a set of synthetic patients. In addition, we demonstrated its distinct contribution to the domain on another synthetic dataset covering publicly available clinical genotype-phenotype associations by using symptoms and clinical signs expressed in free text format.

Conclusions: STARVar stands as a unique and efficient tool that has the advantage of ranking variants with flexibly expressed patient symptoms in free-form text. Therefore, STARVar can be easily integrated into bioinformatics workflows designed to analyze disease-associated genomes.

Availability: STARVar is freely available from https://github.com/bio-ontology-research-group/STARVar .

Keywords: Genomics; Next generation sequencing; Text mining; Variant prioritization; Variant ranking.

MeSH terms

  • Computational Biology
  • Genetic Association Studies
  • Genomics*
  • Humans
  • Phenotype
  • Software*