Hybrid Methods of Bibliographic Coupling and Text Similarity Measurement for Biomedical Paper Recommendation

Stud Health Technol Inform. 2022 Jun 6:290:287-291. doi: 10.3233/SHTI220080.

Abstract

The amount of available scientific literature is increasing, and studies have proposed various methods for evaluating document-document similarity in order to cluster or classify documents for science mapping and knowledge discovery. In this paper, we propose hybrid methods for bibliographic coupling (BC) and linear evaluation of text or content similarity: We combined BC with BM25, Cosine, and PMRA to compare their performances with single methods in paper recommendation tasks using TREC Genomics Track 2005datasets. For paper recommendation, BC and text-based methods complement each other, and hybrid methods were better than single methods. The combinations of BC with BM25 and BC with Cosine performed better than BC with PMRA. The performances were best when the weights of BM25, Cosine, and PMRA were 0.025, 0.2, and 0.2, respectively, in hybrid methods. For paper recommendation, the combinations of BC with text-based methods were better than BC or text-based methods used alone. The choice of method should depend on the actual data and research needs. In the future, the underlying reasons for the differences in performance and the specific part or type of information they complement in text clustering or recommendation need to be examined.

Keywords: Citation-based methods; hybrid methods; text-based methods.

MeSH terms

  • Algorithms*
  • Cluster Analysis