Enhancing recall in automated record screening: A resampling algorithm

Res Synth Methods. 2024 May;15(3):372-383. doi: 10.1002/jrsm.1690. Epub 2024 Jan 7.

Abstract

Literature screening is the process of identifying all relevant records from a pool of candidate paper records in systematic review, meta-analysis, and other research synthesis tasks. This process is time consuming, expensive, and prone to human error. Screening prioritization methods attempt to help reviewers identify most relevant records while only screening a proportion of candidate records with high priority. In previous studies, screening prioritization is often referred to as automatic literature screening or automatic literature identification. Numerous screening prioritization methods have been proposed in recent years. However, there is a lack of screening prioritization methods with reliable performance. Our objective is to develop a screening prioritization algorithm with reliable performance for practical use, for example, an algorithm that guarantees an 80% chance of identifying at least 80 % of the relevant records. Based on a target-based method proposed in Cormack and Grossman, we propose a screening prioritization algorithm using sampling with replacement. The algorithm is a wrapper algorithm that can work with any current screening prioritization algorithm to guarantee the performance. We prove, with mathematics and probability theory, that the algorithm guarantees the performance. We also run numeric experiments to test the performance of our algorithm when applied in practice. The numeric experiment results show this algorithm achieve reliable performance under different circumstances. The proposed screening prioritization algorithm can be reliably used in real world research synthesis tasks.

Keywords: automatic screening algorithm; data mining; literature screen; machine learning; text mining.

MeSH terms

  • Algorithms*
  • Automation
  • Information Storage and Retrieval / methods
  • Meta-Analysis as Topic
  • Models, Statistical
  • Probability
  • Reproducibility of Results
  • Review Literature as Topic
  • Systematic Reviews as Topic / methods