Gait-driven Pose Tracking and Movement Captioning using OpenCV and MediaPipe Machine Learning Framework

Malathi Janapati; Leela Priya Allamsetty; Tarun Teja Potluri; Kavya Vijay Mogili

doi:10.3390/ecsa-11-20470

Previous Article in event

Enhancing Fault Detection in Distributed Motor Systems Using AI-Driven Cyber-Physical Sensor Networks

Next Article in event

LPG Smart Guard: An IoT-Based Solution for Real-Time Gas Cylinder Monitoring and Safety in Smart Homes

Next Article in session

Enhancing Explainability in Convolutional Neural Networks Using Entropy-Based Class Activation Maps

Gait-driven Pose Tracking and Movement Captioning using OpenCV and MediaPipe Machine Learning Framework

Malathi Janapati

^*,

Leela Priya Allamsetty

Tarun Teja Potluri

Kavya Vijay Mogili

¹ Department of Artificial Intelligence and Data Science, Velagapudi Ramakrishna Siddhartha Engineering College, Kanuru, Vijayawada 520007, Andhra Pradesh, India

Academic Editor: Jean-marc Laheurte

Published: 26 November 2024 by MDPI in The 11th International Electronic Conference on Sensors and Applications session Sensors and Artificial Intelligence

https://doi.org/10.3390/ecsa-11-20470

Abstract:

Pose tracking and captioning are extensively employed for motion capturing and activity description in daylight vision scenarios. Activity detection through camera systems presents a complex challenge, necessitating the refinement of numerous algorithms to ensure accurate functionality. Even though there are notable characteristics, IP cameras lack integrated models for effective human activity detection. With this motivation, this paper presents a gait-driven OpenCV and MediaPipe machine-learning framework for human pose and movement captioning. This is implemented by incorporating the Generative 3D Human Shape (GHUM 3D) model which can classify human bones while Python can classify the human movements as either usual or unusual. This model is fed into a website equipped with camera input, activity detection, and gait posture analysis for pose tracking and movement captioning. The proposed approach comprises four modules, two for pose tracking and the remaining two for generating natural language descriptions of movements. The implementation is carried out on two publicly available datasets, CASIA-A and CASIA-B. The proposed methodology emphasizes the diagnostic ability of video analysis by dividing video data available in the datasets into 15-frame segments for detailed examination, where each segment represents a time frame with detailed scrutiny of human movement. Features such as spatial-temporal descriptors, motion characteristics, or key point coordinates are derived from each frame to detect key pose landmarks, focusing on the left shoulder, elbow, and wrist. By calculating the angle between these landmarks, the proposed method classifies the activities as "Walking" (angle between -45 and 45 degrees), "Clapping" (angles below -120 or above 120 degrees), and "Running" (angles below -150 or above 150 degrees). Angles outside these ranges are categorized as "Abnormal," indicating abnormal activities. The experimental results show that the proposed method is robust for individual activity recognition.

Keywords: Activity recognition; Gait analysis; Human movement; Machine learning; Movement captioning; Pose tracking

View paper View Poster

19 Reads
1 Recommendation

Comments on this paper

Malathi Janapati

26 November 2024

Recommended

Yellapragada Venkata Pavan Kumar

26 November 2024

This work demonstrates an impressive integration of advanced technologies for human activity recognition and captioning. Good job on crafting a comprehensive and impactful study.

Purna Prakash Kasaraneni

26 November 2024

Good work on pose tracking and movement captioning

Pradeep Reddy Gogulamudi

26 November 2024

Impressive work

Jyothi Sri Vadlamudi

26 November 2024

Innovative contribution.

Yamini Kodali

26 November 2024

Effective use of Gait analysis for human movement and movement capturing.