Background: Individuals with speech impairments often face significant challenges in daily communication, limiting their ability to interact effectively. Traditional communication aids, though helpful, can be costly or inflexible. Recent advancements in computer vision and deep learning offer new opportunities to develop logical, real-time, and affordable assistive technologies. Objective: This study aims to design and implement a low-cost, vision-based gesture-to-speech system that enables nonverbal individuals to communicate through hand gestures. The goal is to translate recognized gestures into audible speech, bridging the communication gap and enhancing quality of life. Methods: The system uses a standard webcam to capture hand gestures, processed in real-time using OpenCV. A Convolutional Neural Network (CNN) developed with TensorFlow is trained on a custom dataset to classify hand signs accurately. The workflow includes image preprocessing, data augmentation, model training, and deployment. Each recognized gesture is mapped to a corresponding text, which is then converted into speech using a text-to-speech (TTS) engine. Results: The captured hand image is first passed through a filter, and the filtered image is then input to a CNN-based classifier that predicts the gesture class. Once classified, the corresponding word is displayed as output and then converted into audible speech. The system achieved 98% accuracy across 26 alphabetic gestures and performed reliably in real-time with minimal latency under varying lighting and background conditions. Conclusion: The proposed system is an effective and affordable communication aid for individuals with speech impairments. Its modular, real-time design makes it suitable for deployment in resource-constrained settings.
Previous Article in event
Next Article in event
Deep Learning-Based Vision System for Real-Time Gesture Recognition and Speech Synthesis to Assist Non-Verbal Users
Published:
03 December 2025
by MDPI
in The 6th International Electronic Conference on Applied Sciences
session Computing and Artificial Intelligence
Abstract:
Keywords: Gesture recognition, non-verbal communication, hand sign detection, convolutional neural network (CNN), computer vision, OpenCV, real-time system, speech synthesis, assistive technology, TensorFlow, human-computer interaction (HCI), low-cost communication
