Improving Hand Pose Recognition using Localization and Zoom Normalizations over MediaPipe Landmarks

Miguel Ángel Remiro; Manuel Gil-Martín; Rubén San-Segundo

doi:10.3390/ecsa-10-16215

Previous Article in event

Implementation and Advantages of DFT-Based Digital Eddy Current Testing Instrument

Previous Article in session

Evaluating Compact Convolutional Neural Networks for Object Recognition using Sensor Data on Resource-Constrained Devices

Next Article in event

Carbon allotrope-based textile biosensors: a patent landscape analysis

Next Article in session

AI-Driven Digital Twins for Smart Cities

Improving Hand Pose Recognition using Localization and Zoom Normalizations over MediaPipe Landmarks

Miguel Ángel Remiro

¹,

Manuel Gil-Martín

^{*

2},

Rubén San-Segundo

¹ Speech Technology and Machine Learning Group (T.H.A.U. Group), Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, 28040, Madrid, Spain
² Speech Technology Group. Information Processing and Telecomunications Center. E.T.S.I. Telecomunicación. Universidad Politécnica de Madrid.

Academic Editor: Stefano Mariani

Published: 15 November 2023 by MDPI in 10th International Electronic Conference on Sensors and Applications session Sensors and Artificial Intelligence

https://doi.org/10.3390/ecsa-10-16215

Abstract:

Hand Pose Recognition presents significant challenges that need to be addressed, such as varying lighting conditions or complex backgrounds, which can hinder accurate and robust hand pose estimation. This can be mitigated by employing MediaPipe to facilitate the efficient extraction of representative landmarks from static images combined with the use of Convolutional Neural Networks. Extracting these landmarks from the hands mitigates the impact of lighting variability or the presence of complex backgrounds. However, the variability of the location and size of the hands is still not addressed by this process. Therefore, the use of processing modules to normalize these points regarding the location of the wrist and the zoom of the hands can significantly mitigate the effects of these variabilities. In all the experiments performed in this work based on American Sign Language alphabet datasets of 870, 27,000, and 87,000 images, the application of the proposed normalizations has resulted in significant improvements in the model performance in a resource-limited scenario. Particularly, under conditions of high variability applying both normalizations resulted in a performance increment of 45.08 %, increasing the accuracy from 43.94 ± 0.64 % to 89.02 ± 0.40 %.

Keywords: deep learning; computer vision; human activity recognition, hand pose recognition, landmarks, location normalization, zoom normalization

View paper

0 Reads
0 Recommendations

Miguel Ángel Remiro

Manuel Gil-Martín

Rubén San-Segundo