Retrieval and Ranking of Combining Ontology and Content Attributes for Scientific Document

Entropy (Basel). 2022 Jun 10;24(6):810. doi: 10.3390/e24060810.

Abstract

Traditional mathematical search models retrieve scientific documents only by mathematical expressions and their contexts and do not consider the ontological attributes of scientific documents, which result in gaps between the queries and the retrieval results. To solve this problem, a retrieval and ranking model is constructed that synthesizes the information of mathematical expressions with related texts, and the ontology attributes of scientific documents are extracted to further sort the retrieval results. First, the hesitant fuzzy set of mathematical expressions is constructed by using the characteristics of the hesitant fuzzy set to address the multi-attribute problem of mathematical expression matching; then, the similarity of the mathematical expression context sentence is calculated by using the BiLSTM two-way coding feature, and the retrieval result is obtained by synthesizing the similarity between the mathematical expression and the sentence; finally, considering the ontological attributes of scientific documents, the retrieval results are ranked to obtain the final search results. The MAP_10 value of the mathematical expression retrieval results on the Ntcir-Mathir-Wikipedia-Corpus dataset is 0.815, and the average value of the NDCG@10 of the scientific document ranking results is 0.9; these results prove the effectiveness of the scientific document retrieval and ranking method.

Keywords: BiLSTM; HFS; mathematical expressions; ontology attributes; scientific document retrieval and ranking.