Algorithm Design and Mathematical Modeling for Efficient Automatic Speech Recognition in Low-Resource African Languages

Simanga Mchunu

Previous Article in event

House of Mirrors: Monotone Nonlinear Transformations for Modeling and Quantifying Perceptual Distortion in Data-Driven and Psychometric Systems

Next Article in event

LLM-CapGen: A Lightweight Framework for Video Caption Generation Using Large Language Models

Algorithm Design and Mathematical Modeling for Efficient Automatic Speech Recognition in Low-Resource African Languages

Simanga Comfort Mchunu

¹ Department of Mathematical Sciences, University of South Africa, Pretoria 0037, South Africa

Academic Editor: Marjan Mernik

Published: 04 June 2026 by MDPI in The 2nd International Online Conference on Mathematics and Applications session Mathematics, Computer Science and Artificial Intelligence

Abstract:

Introduction.

Deploying Automatic Speech Recognition (ASR) for African languages is critically
hindered by the incompatibility of large-scale models with mobile hardware. Models like
NLLB exceed 1 GB in size and cannot load on typical devices with 2-4 GB RAM, preventing
real-time conversational ASR deployment in resource-constrained settings. This technological
barrier disproportionately affects African linguistic communities.
Methods.

We formulate ASR deployment as a constrained optimization problem: minimize
Word Error Rate (WER) subject to bounds on model size, latency, and computational complexity.
Our solution integrates three complementary techniques: (1) Knowledge Distillation,
transferring knowledge from large teacher networks to compact students; (2) Low-Rank Factorization
(W ≈ UVT ), reducing parameters from O(p) to O(rp); and (3) Post-training 8-
bit Quantization for 4× memory reduction. Theoretical analysis provides sample complexity
bounds of O(k log n) and quantifies output perturbation from compression.
Results.

On a 20M-parameter baseline (85 MB), our framework achieves 16× total compression
to approximately 5 MB. Across iSiZulu, Setswana, and Sesotho datasets, mainly South African langauges, we maintain competitive
WER with only 5-6% degradation while increasing inference throughput from 0.8× to 6× in real time
on commodity CPUs. This 7.5× latency improvement enables previously impossible mobile
deployment.
Conclusions.

This work demonstrates that algorithm design grounded in mathematical optimization
and hardware-aware compression enables scalable, practical ASR for underserved languages.
The framework provides a general blueprint for efficient edge AI deployment, directly
advancing speech technology accessibility and digital inclusion for African linguistic communities
globally.

Keywords: Automatic Speech Recognition; Low-Resource Languages; Model Compression; Knowledge Distillation; Quantization; Edge Computing; Mathematical Optimization

View Poster

27 Reads
0 Recommendations

Simanga Mchunu