Development of an integrated and inferenceable RDF database of glycan, pathogen and disease resources

Sci Data. 2023 Sep 6;10(1):582. doi: 10.1038/s41597-023-02442-2.

Abstract

Glycans are known to play extremely important roles in infections by viruses and pathogens. In fact, the SARS-CoV-2 virus has been shown to have evolved due to a single change in glycosylation. However, data resources on glycans, pathogens and diseases are not well organized. To accurately obtain such information from these various resources, we have constructed a foundation for discovering glycan and virus interaction data using Semantic Web technologies to be able to semantically integrate such heterogeneous data. Here, we created an ontology to encapsulate the semantics of virus-glycan interactions, and used Resource Description Framework (RDF) to represent the data we obtained from non-RDF related databases and data associated with literature. These databases include PubChem, SugarBind, and PSICQUIC, which made it possible to refer to other RDF resources such as UniProt and GlyTouCan. We made these data publicly available as open data and provided a service that allows anyone to freely perform searches using SPARQL. In addition, the RDF resources created in this study are available at the GlyCosmos Portal.

Publication types

  • Dataset

MeSH terms

  • COVID-19*
  • Databases, Factual
  • Glycosylation
  • Humans
  • Polysaccharides
  • SARS-CoV-2

Substances

  • Polysaccharides