Hand Pose Recognition presents significant challenges that need to be addressed, such as varying lighting conditions or complex backgrounds, which can hinder accurate and robust hand pose estimation. This can be mitigated by employing MediaPipe to facilitate the efficient extraction of representative landmarks from static images combined with the use of Convolutional Neural Networks. Extracting these landmarks from the hands mitigates the impact of lighting variability or the presence of complex backgrounds. However, the variability of the location and size of the hands is still not addressed by this process. Therefore, the use of processing modules to normalize these points regarding the location of the wrist and the zoom of the hands can significantly mitigate the effects of these variabilities. In all the experiments performed in this work based on American Sign Language alphabet datasets of 870, 27,000, and 87,000 images, the application of the proposed normalizations has resulted in significant improvements in the model performance in a resource-limited scenario. Particularly, under conditions of high variability applying both normalizations resulted in a performance increment of 45.08 %, increasing the accuracy from 43.94 ± 0.64 % to 89.02 ± 0.40 %.
Previous Article in event
Previous Article in session
Next Article in event
Next Article in session
Improving Hand Pose Recognition using Localization and Zoom Normalizations over MediaPipe Landmarks
Published:
15 November 2023
by MDPI
in 10th International Electronic Conference on Sensors and Applications
session Sensors and Artificial Intelligence
Abstract:
Keywords: deep learning; computer vision; human activity recognition, hand pose recognition, landmarks, location normalization, zoom normalization