Reaching for upper bound ROUGE score of extractive summarization methods

PeerJ Comput Sci. 2022 Sep 26:8:e1103. doi: 10.7717/peerj-cs.1103. eCollection 2022.

Abstract

The extractive text summarization (ETS) method for finding the salient information from a text automatically uses the exact sentences from the source text. In this article, we answer the question of what quality of a summary we can achieve with ETS methods? To maximize the ROUGE-1 score, we used five approaches: (1) adapted reduced variable neighborhood search (RVNS), (2) Greedy algorithm, (3) VNS initialized by Greedy algorithm results, (4) genetic algorithm, and (5) genetic algorithm initialized by the Greedy algorithm results. Furthermore, we ran experiments on articles from the arXive dataset. As a result, we found 0.59 and 0.25 scores for ROUGE-1 and ROUGE-2, respectively achievable by the approach, where the genetic algorithm initialized by the Greedy algorithm results, which happens to yield the best results out of the tested approaches. Moreover, those scores appear to be higher than scores obtained by the current state-of-the-art text summarization models: the best score in the literature for ROUGE-1 on the same data set is 0.46. Therefore, we have room for the development of ETS methods, which are now undeservedly forgotten.

Keywords: Genetic algorithm; Greedy algorithm; Rouge; Text summarization; Variable neighborhood search.

Grants and funding

This research is conducted within the Committee of Science of the Ministry of Education and Science of the Republic of Kazakhstan under the grant number AP09058174 in the course of “Development of language-independent unsupervised semantic analysis methods large amounts of text data” project. The work was done with the support from the Mexican Government through the grant A1-S-47854 of CONACYT, Mexico, and grants 20211784, 20211884, and 20211178 of the Secretaria de Investigación y Posgrado of the Instituto Politecnico Nacional, Mexico. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.