On-Device Automatic Speech Recognition for IIoT and Extended Reality Industrial Metaverse Applications

Antón Valladares-Poncela; Paula Fraga-Lamas; Tiago M. Fernández-Caramés

doi:10.3390/ecsa-11-20466

Previous Article in event

High OIP3 Low Noise Amplifier Design Based on 0.13μm CMOS Process for High-Precision Sensors

Previous Article in session

A study of spatial feature conservation in reduced channels of EEG-fNIRS based BCI using Deep Learning

Next Article in event

Two-step chronoamperometric determination of antioxidant capacity of water extracts from medicinal plants

Next Article in session

Enhancing Fault Detection in Distributed Motor Systems Using AI-Driven Cyber-Physical Sensor Networks

On-Device Automatic Speech Recognition for IIoT and Extended Reality Industrial Metaverse Applications

Antón Valladares-Poncela

^*,

Paula Fraga-Lamas

Tiago M. Fernández-Caramés

¹ Centro Mixto de Investigación UDC-Navantia
² CITIC, Universidade da Coruña

Academic Editor: Francisco Falcone

Published: 26 November 2024 by MDPI in The 11th International Electronic Conference on Sensors and Applications session Sensors and Artificial Intelligence

https://doi.org/10.3390/ecsa-11-20466

Abstract:

This paper presents a comprehensive study on enhancing Industrial Internet of Things (IIoT) and Industrial Metaverse Applications through the integration of On-Device Automatic Speech Recognition (ASR) using Microsoft HoloLens 2 smart glasses. Specifically, this paper focuses on the utilization of the HoloLens 2's microphone array and sound capture APIs to benchmark the performance and accuracy of on-device ASR models. The evaluation of these models includes metrics such as Character Error Rate (CER), Word Error Rate (WER), and latency. Furthermore, the paper explores various optimization techniques, including quantization tools and model refinement strategies, aimed at minimizing latency while maintaining high accuracy. The study also emphasizes the importance of supporting low-resource languages, using Galician—a language spoken by less than 3 million people worldwide—as a case study. By benchmarking different variations of a Wav2Vec2.0-based ASR model fine-tuned for Galician, the research identifies the most effective models and their optimal runtime configurations. This work underscores the critical role of low-latency, on-device ASR systems in real-time IIoT and Industrial Metaverse applications, highlighting how these technologies can enhance operational efficiency, privacy, and user experience in industrial environments. The findings contribute to the broader applicability of ASR's potential in supporting emerging Metaverse applications across various industrial contexts.

Keywords: Automatic Speech Recognition; ASR; Internet of Things; IIoT; Industrial Metaverse; Microsoft HoloLens 2; Extended Reality

View paper

7 Reads
0 Recommendations

Antón Valladares-Poncela

Paula Fraga-Lamas

Tiago M. Fernández-Caramés