Inclusive Multilingual Assistive Technology using a Computer Vision- and Transformer-based NLP Approach for the Visually Impaired

Published: 03 December 2025 by MDPI in The 6th International Electronic Conference on Applied Sciences session Computing and Artificial Intelligence

Abstract:

Inclusivity has become a central theme in the development of digital technologies. Designing with inclusivity in mind requires that systems remain accessible and valuable to users irrespective of their age, gender, or socio-economic background. However, most applications remain restricted to a single language—primarily English—thereby marginalizing large groups of non-English-speaking individuals. This work explores the integration of Computer Vision (CV) and Natural Language Processing (NLP) to enhance accessibility for visually impaired users, with a particular emphasis on multilingual support in assistive technologies. Through a review of the existing literature and user experiences, this study identifies language barriers as a major obstacle in accessing essential services. To address this, we employ a multi-stage methodology for multilingual image captioning. Image–caption pairs were extracted from the MS COCO dataset, reformatted into JSON, and translated from English to the local language (currently Hindi) to generate a bilingual corpus. The model combines Recurrent Neural Networks (RNNs) for image feature extraction with Long Short-Term Memory (LSTM) units for sequence generation, enabling the system to capture temporal dependencies inherent to natural language. Experimental outcomes indicate that the model can generate Hindi captions with about 80% accuracy, effectively describing visual scenes despite some grammatical limitations. Real-time camera integration and a text-to-voice module further enhance usability by delivering immediate audio captions to visually impaired users. Future work will focus on transformer-based multilingual architectures and larger datasets to improve accuracy, contextual richness, and language coverage, moving towards a robust, speech-enabled assistive platform.

Keywords: Image captioning; Scene Recognition; Language processing; Assistive technology; Computer vision; Natural language processing; transformer-based multilingual architectures; Recurrent Neural Network (RNN); , Long Short Term Memory(LSTM);Transformer Models

View Poster

36 Reads
0 Recommendations

Lija Jacob

Thomas T.

Naveen Krishna

Amrutha Paalathara