This work focuses on the centrality of NLP and TTS in promoting communication for the Urdu-speaking population where there is a dearth of language assets in the regional languages. While English and other languages of European origin have reliable computational assets available, Urdu is still considered relatively illiterate in this aspect, and hence restricted.
Therefore, to address this problem, we constructed our own dataset using audio from a YouTube playlist that contains an Urdu novel reader for more than 100 hours. This dataset was carefully preprocessed for our use and different errors were corrected to provide high-quality input for our TTS models. Our work constitutes one of the first research attempts at creating a large-scale Urdu speech dataset and at employing unique techniques of Automatic Speech Analysis. To achieve this purpose, the linguistic and cultural characteristics of the Urdu language are incorporated in this approach to guarantee that the voices generated are sincere.
In view of this, our project was aimed at developing TTS systems for creating natural voice outputs that take into consideration cultural differences and youthful appeal by building upon well-established neural network models in speech synthesis and by incorporating new techniques.
The results of our work are promising: we also managed to create a TTS model for accurately reading Urdu text, which was also marked to have perfect native-speaker-like pronunciation. These are the practical implications of our research across education, digital accessibility and media, possibly shifting popular culture. What we are trying to achieve is more friendly and natural biometrics for speech interfacing for Urdu users.
Previous Article in event
Next Article in event
Towards a More Natural Urdu: A Comprehensive Approach to Text-to-Speech and Voice Cloning
Published:
02 December 2024
by MDPI
in The 5th International Electronic Conference on Applied Sciences
session Computing and Artificial Intelligence
Abstract:
Keywords: Text-to-Speech; Urdu Language; Speech Synthesis; Computational Linguistics; Language Modeling; Natural Language Processing; Speech Processing
Comments on this paper