A RULE-BASED MODEL FOR STEMMING HAUSA WORDS

MUSTAPHA ASHIRU; HADIZA UMAR; BELLO BELLO; IBRAHIM AHMED

Previous Article in event

Investigating the Availability and Key Features of Dental Health Applications in the Google Play Store

Next Article in event

Fall detection assessment in older adults using a smart wearable device

A RULE-BASED MODEL FOR STEMMING HAUSA WORDS

MUSTAPHA BARI ASHIRU

^*,

HADIZA ALI UMAR

BELLO SHEHU BELLO

IBRAHIM SAID AHMED

¹ Computer Science dept. of Bayero University, Kano State, Nigeria

Academic Editor: Francesco Dell'olio

Published: 04 December 2024 by MDPI in The 5th International Electronic Conference on Applied Sciences session Computing and Artificial Intelligence

Abstract:

The increasing number of online communities has led to significant growth in digital data in multiple languages on the Internet. Consequently, language processing and information retrieval have become important fields in the era of the Internet. Stemming, a crucial preprocessing tool in natural language processing and information retrieval, has been extensively explored for high-resource languages like English, German, and French. However, more extensive studies regarding stemming in the context of the Hausa language, an international language that is widely spoken in West Africa and one of the fastest-growing languages globally, are required.

This paper presents a rule-based model for stemming Hausa words. The proposed model relies on a set of rules derived from the analysis of Hausa word morphology and the rules for extracting stem forms. The rules consider the syntactic constraints, e.g., affixation rules, and performs a morphological analysis of the properties of the Hausa language, such as word formation and distribution.

The proposed model’s performance is evaluated against existing models using standard evaluation metrics. The evaluation method employed Sirstat’s approach, and a language expert assessed the system’s results. The model is evaluated using manual annotation of a set of 5,077 total words used in the algorithm, including 2,630 unique words and 3,766 correctly stemmed Hausa words. The model achieves an overall accuracy of 98.8%, demonstrating its suitability for use in applications such as natural language processing and information retrieval.

Keywords: Information Retrieval, Natural Language Processing, Stemming, Root Words, Morphology, Hausa

4 Reads
0 Recommendations

MUSTAPHA ASHIRU

HADIZA UMAR

BELLO BELLO

IBRAHIM AHMED