Extractive text summarization is a technique that automatically generates a concise version of a document by selecting and rearranging its most important sentences verbatim. This paper proposes an improved graph-based method for Hausa single-document extractive summarization. The improvement is achieved through the use of a hybrid similarity function, created by first evaluating the performance of four distinct similarity measures individually within a ranking algorithm. These measures are cosine similarity, Jaccard similarity, the overlap coefficient, and n-gram/Dice’s coefficient similarity. Three other similarity measures were then each combined with the n-gram/Dice’s coefficient similarity using the simple harmonic mean to form hybrid similarity functions. To evaluate the effectiveness of the proposed method, the Hausa extractive text summarization corpus was used. Performance was assessed using standard evaluation metrics, including precision, recall, and F-score. Among the tested combinations, cosine similarity combined with n-gram/Dice’s coefficient similarity yielded the best performance. It achieved F-score values of 0.8085 for ROUGE-1, 0.3705 for ROUGE-2, and 0.6946 for ROUGE-L, outperforming the other similarity pairings. These results demonstrate that integrating cosine similarity with n-gram/Dice’s coefficient similarity significantly enhances the performance of graph-based extractive summarization for Hausa text. This study contributes to the advancement of natural language processing tools for under-resourced languages like Hausa and provides a foundation for further development in multi-lingual text summarization systems.
Previous Article in event
Next Article in event
An Improved Graph-Based Method for Hausa Text Single-Document Summary Extraction Using a Hybrid Similarity Function
Published:
03 December 2025
by MDPI
in The 6th International Electronic Conference on Applied Sciences
session Computing and Artificial Intelligence
Abstract:
Keywords: Extractive summarization; Hausa text summarization; Graph-based algorithm; Similarity measurement; Hybrid similarity.
