Stop word removal is a fundamental preprocessing step in Natural Language Processing (NLP), aiming to eliminate non-informative words that may degrade the performance of downstream tasks. While its impact has been widely explored in high-resource languages like English, its effectiveness in low-resource languages such as Hausa remains under-investigated, particularly in the context of extractive text summarization. This study addresses that gap by examining the role of stop word removal in enhancing summarization performance for Hausa academic texts. We introduce a new benchmark dataset specifically curated for Hausa extractive summarization, composed of academic abstracts. Furthermore, we propose an enhanced variant of the TextRank algorithm that leverages a combination of sentence-level features including positional weight, lexical similarity, and semantic similarity to compute edge weights in the sentence similarity graph. This feature-rich graph structure allows for a more context-aware sentence ranking process. The proposed model is evaluated against standard baselines, namely, TextRank and LexRank, using the ROUGE evaluation metric. Experimental results demonstrate that our method significantly outperforms the baselines across ROUGE-1, ROUGE-2, and ROUGE-L scores. Additionally, ablation studies with and without a tailored Hausa stop word list reveal a notable performance gain when stop words are removed. These findings highlight the importance of language-specific preprocessing strategies in improving NLP outcomes for low-resource languages.
Previous Article in event
Next Article in event
Performance Evaluation of Stop Word Influence in Hausa Extractive Summarization using Enhanced TextRank
Published:
03 December 2025
by MDPI
in The 6th International Electronic Conference on Applied Sciences
session Computing and Artificial Intelligence
Abstract:
Keywords: Hausa language, extractive text summarization, stop word removal, TextRank, ROUGE, low-resource NLP, sentence ranking
