Graph-based extractive text summarization method for Hausa text

Abdulkadir Abubakar Bichi; Ruhaidah Samsudin; Rohayanti Hassan; Layla Rasheed Abdallah Hasan; Abubakar Ado Rogo

doi:10.1371/journal.pone.0285376

Graph-based extractive text summarization method for Hausa text

PLoS One. 2023 May 9;18(5):e0285376. doi: 10.1371/journal.pone.0285376. eCollection 2023.

Authors

Abdulkadir Abubakar Bichi¹, Ruhaidah Samsudin¹, Rohayanti Hassan¹, Layla Rasheed Abdallah Hasan¹, Abubakar Ado Rogo²

Affiliations

¹ School of Computing, Universiti Teknologi Malaysia, Johor, Malaysia.
² Department of Computer Science, Yusuf Maitama Sule University, Kano, Nigeria.

Abstract

Automatic text summarization is one of the most promising solutions to the ever-growing challenges of textual data as it produces a shorter version of the original document with fewer bytes, but the same information as the original document. Despite the advancements in automatic text summarization research, research involving the development of automatic text summarization methods for documents written in Hausa, a Chadic language widely spoken in West Africa by approximately 150,000,000 people as either their first or second language, is still in early stages of development. This study proposes a novel graph-based extractive single-document summarization method for Hausa text by modifying the existing PageRank algorithm using the normalized common bigrams count between adjacent sentences as the initial vertex score. The proposed method is evaluated using a primarily collected Hausa summarization evaluation dataset comprising of 113 Hausa news articles on ROUGE evaluation toolkits. The proposed approach outperformed the standard methods using the same datasets. It outperformed the TextRank method by 2.1%, LexRank by 12.3%, centroid-based method by 19.5%, and BM25 method by 17.4%.

Copyright: © 2023 Bichi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Africa, Western
Algorithms*
Head*
Humans
Language
Writing

Grants and funding

This work was supported by the Tertiary Education Trust Fund (Grant number: TETF/ES/UNIV/KANO/TSAS/2019). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.