Data mart construction based on semantic annotation of scientific articles: A case study for the prioritization of drug targets

Comput Methods Programs Biomed. 2018 Apr:157:225-235. doi: 10.1016/j.cmpb.2018.01.010. Epub 2018 Jan 12.

Abstract

Background and objectives: Semantic text annotation enables the association of semantic information (ontology concepts) to text expressions (terms), which are readable by software agents. In the scientific scenario, this is particularly useful because it reveals a lot of scientific discoveries that are hidden within academic articles. The Biomedical area has more than 300 ontologies, most of them composed of over 500 concepts. These ontologies can be used to annotate scientific papers and thus, facilitate data extraction. However, in the context of a scientific research, a simple keyword-based query using the interface of a digital scientific texts library can return more than a thousand hits. The analysis of such a large set of texts, annotated with such numerous and large ontologies, is not an easy task. Therefore, the main objective of this work is to provide a method that could facilitate this task.

Methods: This work describes a method called Text and Ontology ETL (TOETL), to build an analytical view over such texts. First, a corpus of selected papers is semantically annotated using distinct ontologies. Then, the annotation data is extracted, organized and aggregated into the dimensional schema of a data mart.

Results: Besides the TOETL method, this work illustrates its application through the development of the TaP DM (Target Prioritization data mart). This data mart has focus on the research of gene essentiality, a key concept to be considered when searching for genes showing potential as anti-infective drug targets.

Conclusions: This work reveals that the proposed approach is a relevant tool to support decision making in the prioritization of new drug targets, being more efficient than the keyword-based traditional tools.

Keywords: Decision support systems; Drug target prioritization; Semantic annotation.

MeSH terms

  • Data Warehousing
  • Decision Making
  • Drug Delivery Systems*
  • Genes, Essential*
  • Information Storage and Retrieval*
  • PubMed
  • Semantics*
  • Vocabulary, Controlled