Performance Evaluation of Stop Word Influence in Hausa Extractive Summarization using Enhanced TextRank

¹ Department of Software Engineering, Faculty of Computing, Northwest University, Kano, Nigeria
² Software Department, Faculty of Computing Northwest University, Kano, Nigeria
³ Faculty of Computing, Northwest University, Kano, Nigeria
⁴ Computer Science Department, Faculty of Computing Northwest University, Kano, Nigeria
⁵ Faculty of Computer Science and Mathematics, Universiti Teknologi Mara, Shah Alam, Selangor, Malaysia
⁶ Department of Computer Science, Faculty of Computing and Mathematical Science, Aliko Dangote University of Science and Technology, Wudil, Nigeria

Academic Editor: Eugenio Vocaturo

Published: 03 December 2025 by MDPI in The 6th International Electronic Conference on Applied Sciences session Computing and Artificial Intelligence

Abstract:

Stop word removal is a fundamental preprocessing step in Natural Language Processing (NLP), aiming to eliminate non-informative words that may degrade the performance of downstream tasks. While its impact has been widely explored in high-resource languages like English, its effectiveness in low-resource languages such as Hausa remains under-investigated, particularly in the context of extractive text summarization. This study addresses that gap by examining the role of stop word removal in enhancing summarization performance for Hausa academic texts. We introduce a new benchmark dataset specifically curated for Hausa extractive summarization, composed of academic abstracts. Furthermore, we propose an enhanced variant of the TextRank algorithm that leverages a combination of sentence-level features including positional weight, lexical similarity, and semantic similarity to compute edge weights in the sentence similarity graph. This feature-rich graph structure allows for a more context-aware sentence ranking process. The proposed model is evaluated against standard baselines, namely, TextRank and LexRank, using the ROUGE evaluation metric. Experimental results demonstrate that our method significantly outperforms the baselines across ROUGE-1, ROUGE-2, and ROUGE-L scores. Additionally, ablation studies with and without a tailored Hausa stop word list reveal a notable performance gain when stop words are removed. These findings highlight the importance of language-specific preprocessing strategies in improving NLP outcomes for low-resource languages.

Keywords: Hausa language, extractive text summarization, stop word removal, TextRank, ROUGE, low-resource NLP, sentence ranking

18 Reads
0 Recommendations

Abdulkadir Bichi

Abdulrauf Sharifai

Abubakar Salisu