SALON ontology for the formal description of sequence alignments

BMC Bioinformatics. 2023 Feb 27;24(1):69. doi: 10.1186/s12859-023-05190-7.

Abstract

Background: Information provided by high-throughput sequencing platforms allows the collection of content-rich data about biological sequences and their context. Sequence alignment is a bioinformatics approach to identifying regions of similarity in DNA, RNA, or protein sequences. However, there is no consensus about the specific common terminology and representation for sequence alignments. Thus, automatically linking the wide existing knowledge about the sequences with the alignments is challenging.

Results: The Sequence Alignment Ontology (SALON) defines a helpful vocabulary for representing and semantically annotating pairwise and multiple sequence alignments. SALON is an OWL 2 ontology that supports automated reasoning for alignments validation and retrieving complementary information from public databases under the Open Linked Data approach. This will reduce the effort needed by scientists to interpret the sequence alignment results.

Conclusions: SALON defines a full range of controlled terminology in the domain of sequence alignments. It can be used as a mediated schema to integrate data from different sources and validate acquired knowledge.

Keywords: Data integration; Linked data; Ontology; Semantic web; Sequence alignment.

MeSH terms

  • Amino Acid Sequence
  • Computational Biology*
  • Consensus
  • Databases, Factual
  • Sequence Alignment