Recommender system of scholarly papers using public datasets

AMIA Jt Summits Transl Sci Proc. 2021 May 17:2021:672-679. eCollection 2021.

Abstract

The exponential growth of public datasets in the era of Big Data demands new solutions for making these resources findable and reusable. Therefore, a scholarly recommender system for public datasets is an important tool in the field of information filtering. It will aid scholars in identifying prior and related literature to datasets, saving their time, as well as enhance the datasets reusability. In this work, we developed a scholarly recommendation system that recommends research-papers, from PubMed, relevant to public datasets, from Gene Expression Omnibus (GEO). Different techniques for representing textual data are employed and compared in this work. Our results show that term-frequency based methods (BM25 and TF-IDF) outperformed all others including popular Natural Language Processing embedding models such as doc2vec, ELMo and BERT.

MeSH terms

  • Humans
  • Natural Language Processing*
  • Publications*