Machine learning model for Hausa part-of-speech tagging

Ruqayya Ibrahim; Kabir Umar; kabir bello; hauwa musa

Previous Article in event

Evaluation of different filtering strategies for ICESat-2 ATL08 data for evaluation of DEMs for Madurai Region

Next Article in event

Enhanced Drone Detection Model for Edge Devices: Combining Knowledge Distillation and Bayesian Optimization

Machine learning model for Hausa part-of-speech tagging

Ruqayya Auwal Ibrahim

^{*

1},

Kabir Umar

²,

kabir muhammad bello

¹,

hauwa abubakar musa

¹ Department of computer science, Faculty of Computing, Bayero University Kano
² Department of Software Engineering, Faculty of computing, Bayero University Kano

Academic Editor: Eugenio Vocaturo

Published: 03 December 2024 by MDPI in The 5th International Electronic Conference on Applied Sciences session Computing and Artificial Intelligence

Abstract:

Part-of-speech (POS) tagging involves tagging each word in a text with the appropriate part of speech. POS tagging is regarded as one of the fundamental technologies required in Natural Language Processing (NLP) applications. For many natural language processing jobs, this procedure is regarded as one of the pre-processing processes. Recently, with the development of machine learning-based algorithms, the process of part-of-speech tagging improved, and there are now a respectable number of taggers accessible for high-resource languages like English. However, low-resource languages like Hausa continue to lack accurate and effective computational approaches for part-of-speech tagging. Despite the recent exponential expansion of Hausa online content on websites like BBC.com/Hausa, Freedomradio.com.ng, Hausa Leadership.ng, Aminiya and dailytrust.com.ng, part-of-speech tagging on such Hausa web content has not been investigated by the research community. Therefore, part-of-speech tagging on Hausa-based web contents is a new topic that can be researched. This research work proposed a machine learning-based method for Hausa part-of-speech tagging. We implement three architectures, namely, long short-term memory (LSTM), bi-directional long short-term memory (BLSTM) and gated recurrent unit (GRU), to perform part-of-speech tagging on a Hausa data set. The labeled data are transformed into a one-hot-vector encoding and then sent through a deep neural network using LSTM, BLSTM and GRU hidden layers. We obtain precision, recall, accuracy and f1-score as the evaluation matrix of the three architectures. In conclusion, the system achieves an overall result of 99%, and this shows that the proposed approach outperforms the previous approach (with a result of 79.14%) in terms of precision, recall, accuracy and f1-score.

Keywords: machine learning, Hausa, model, part of speech, tagging

0 Reads
0 Recommendations

Ruqayya Ibrahim

Kabir Umar

kabir bello

hauwa musa