Accessible Vision: Empowering the Visually Impaired through Voice-Assisted Object Recognition and Spatial Awareness

¹ Department of Artificial Intelligence, Faculty of Computer Science and Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, 23460 Topi, Khyber Pakhtoonkha, Pakistan.
² Department of Data Science, Faculty of Computer Science and Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, 23460 Topi, Khyber Pakhtoonkha, Pakistan.
³ Department of Business, University of Europe for Applied Sciences, Think Campus, 14469 Potsdam, Germany.
⁴ Artificial Intelligence Research (AIR) Group, , Department of Artificial Intelligence, Faculty of Computer Science and Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, 23460 Topi, Khyber Pakhtoonkha, Pakistan.

Academic Editor: Eugenio Vocaturo

Published: 02 December 2024 by MDPI in The 5th International Electronic Conference on Applied Sciences session Computing and Artificial Intelligence

Abstract:

This research paper introduces Accessible Vision as an innovative assistive technology meant to improve the independence and reliance of people with visual impairments on context-aware assistive technologies by employing intricate computer vision algorithms. This system is composed of the following functional components: YOLOv8 model for real-time object detection, MiDaS for distance measurement together with stereo vision, and a TTS for real-time audio feedback for the blind. Precisely, this helps visually impaired people to achieve an improved picture of their environment by giving them accurate information on the objects in the vicinity and their relative position in space. The main focus of Accessible Vision is to respond to the unique difficulties that people with vision impairment have to face in order to make it through daily environments. Many conventional assistive devices are not capable of delivering the processing of real-time features nor how accurate they are regarding the recognition of objects and space. Since YOLOv8 yields high performance, our approach enables the recognition of numerous objects with high speed and recall accuracy. Moreover, for estimating depth information for monocular cameras or for stereo vision, the applicability of MiDaS is again beneficial since distance measurements are critical for orientation. The working procedure of the system has been described in our methodology section, which outlines the following steps: Firstly, the YOLOv8 model\cite{lou2023dc} was trained and optimized on a broad dataset of objects in different settings to increase the algorithms’ adaptability to various conditions. It also provides comparison between MiDaS and stereo vision as well as the geeks and ticks of both approaches under different context. The incorporation of the TTS model is explained in this paper’s context, with a focus on its function of availing satisfactory and contextually relevant sound prompts to the user.

Keywords: Object Detection; YOLOv8; Distance Calculation; MiDaS; Stereo Vision; Text-to-Speech; Assistive Technology; Visually Impaired

0 Reads
0 Recommendations

Muhammad Saud

Muhammad Imran

Shahmeer Bhatti

Raja Ali