An Automated Literature Review Tool (LiteRev) for Streamlining and Accelerating Research Using Natural Language Processing and Machine Learning: Descriptive Performance Evaluation Study

Erol Orel; Iza Ciglenecki; Amaury Thiabaud; Alexander Temerev; Alexandra Calmy; Olivia Keiser; Aziza Merzouki

doi:10.2196/39736

An Automated Literature Review Tool (LiteRev) for Streamlining and Accelerating Research Using Natural Language Processing and Machine Learning: Descriptive Performance Evaluation Study

J Med Internet Res. 2023 Sep 15:25:e39736. doi: 10.2196/39736.

Authors

Erol Orel¹, Iza Ciglenecki², Amaury Thiabaud¹, Alexander Temerev¹, Alexandra Calmy³, Olivia Keiser¹, Aziza Merzouki¹

Affiliations

¹ Institute of Global Health, University of Geneva, Geneva, Switzerland.
² Médecins Sans Frontières, Geneva, Switzerland.
³ HIV/AIDS Unit, Division of Infectious Diseases, Geneva University Hospital, Geneva, Switzerland.

PMID: 37713261
PMCID: PMC10541641
DOI: 10.2196/39736

Abstract

Background: Literature reviews (LRs) identify, evaluate, and synthesize relevant papers to a particular research question to advance understanding and support decision-making. However, LRs, especially traditional systematic reviews, are slow, resource-intensive, and become outdated quickly.

Objective: LiteRev is an advanced and enhanced version of an existing automation tool designed to assist researchers in conducting LRs through the implementation of cutting-edge technologies such as natural language processing and machine learning techniques. In this paper, we present a comprehensive explanation of LiteRev's capabilities, its methodology, and an evaluation of its accuracy and efficiency to a manual LR, highlighting the benefits of using LiteRev.

Methods: Based on the user's query, LiteRev performs an automated search on a wide range of open-access databases and retrieves relevant metadata on the resulting papers, including abstracts or full texts when available. These abstracts (or full texts) are text processed and represented as a term frequency-inverse document frequency matrix. Using dimensionality reduction (pairwise controlled manifold approximation) and clustering (hierarchical density-based spatial clustering of applications with noise) techniques, the corpus is divided into different topics described by a list of the most important keywords. The user can then select one or several topics of interest, enter additional keywords to refine its search, or provide key papers to the research question. Based on these inputs, LiteRev performs a k-nearest neighbor (k-NN) search and suggests a list of potentially interesting papers. By tagging the relevant ones, the user triggers new k-NN searches until no additional paper is suggested for screening. To assess the performance of LiteRev, we ran it in parallel to a manual LR on the burden and care for acute and early HIV infection in sub-Saharan Africa. We assessed the performance of LiteRev using true and false predictive values, recall, and work saved over sampling.

Results: LiteRev extracted, processed, and transformed text into a term frequency-inverse document frequency matrix of 631 unique papers from PubMed. The topic modeling module identified 16 topics and highlighted 2 topics of interest to the research question. Based on 18 key papers, the k-NNs module suggested 193 papers for screening out of 613 papers in total (31.5% of the whole corpus) and correctly identified 64 relevant papers out of the 87 papers found by the manual abstract screening (recall rate of 73.6%). Compared to the manual full text screening, LiteRev identified 42 relevant papers out of the 48 papers found manually (recall rate of 87.5%). This represents a total work saved over sampling of 56%.

Conclusions: We presented the features and functionalities of LiteRev, an automation tool that uses natural language processing and machine learning methods to streamline and accelerate LRs and support researchers in getting quick and in-depth overviews on any topic of interest.

Keywords: HIV; LiteRev; acute; automation; clustering; early; literature review; machine learning; natural language processing; topic.

©Erol Orel, Iza Ciglenecki, Amaury Thiabaud, Alexander Temerev, Alexandra Calmy, Olivia Keiser, Aziza Merzouki. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 15.09.2023.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cluster Analysis
Databases, Factual
HIV Infections*
Humans
Machine Learning
Natural Language Processing*
Review Literature as Topic