Important citation identification by exploiting content and section-wise in-text citation count

PLoS One. 2020 Mar 5;15(3):e0228885. doi: 10.1371/journal.pone.0228885. eCollection 2020.

Abstract

A citation is deemed as a potential parameter to determine linkage between research articles. The parameter has extensively been employed to form multifarious academic aspects like calculating the impact factor of journals, h-Index of researchers, allocate different research grants, find the latest research trends, etc. The current state-of-the-art contends that all citations are not of equal importance. Based on this argument, the current trend in citation classification community categorizes citations into important and non-important reasons. The community has proposed different approaches to extract important citations such as citation count, context-based, metadata, and textual based approaches. The contemporary state-of-the-art in citation classification community ignores significantly potential features that can play a vital role in citation classification. This research presents a novel approach for binary citation classification by exploiting section-wise in-text citation frequencies, similarity score, and overall citation count-based features. The study also introduces machine learning algorithms based novel approach for assigning appropriate weights to the logical sections of research papers. The weights are allocated to the citations with respect to their sections. To perform the classification, we used three classification techniques, Support Vector Machine, Kernel Linear Regression, and Random Forest. The experiment was performed on two annotated benchmark datasets that contain 465 and 311 citation pairs of research articles respectively. The results revealed that the proposed approach attained an improved value of precision (i.e., 0.84 vs 0.72) from contemporary state-of-the-art approach.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Journal Impact Factor*
  • Linear Models
  • Metadata
  • Periodicals as Topic / statistics & numerical data*
  • Support Vector Machine

Grants and funding

This research was funded by Deanship of Scientific Research at Princess Nourah bint Abdulrahman University, through the Fast-track Research Funding Program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.