{"title":"A hybrid approach for real-time hand tracking using fiducial markers and inertial sensors","authors":"Ranjeet Bidwe , Shubhangi Deokar , Yash Parkhi , Tanisha Vyas , Nimita Jestin , Utkarsh Kumar , Satviki Budhia , Armaan Jeswani","doi":"10.1016/j.mex.2025.103609","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a cost-effective hybrid hand-tracking technique that integrates fiducial marker detection, capacitive touch sensing, and inertial measurement for real-time gesture recognition in immersive environments. The system is implemented on lightweight hardware comprising a Raspberry Pi Zero 2 W and an ESP32, with OpenCV’s ArUco marker detection enabling 3D hand pose estimation, capacitive sensors supporting finger-state recognition, and an Inertial Measurement Unit (IMU) providing orientation tracking. Optimizations such as exposure adjustment and region-of-interest processing ensure robust marker detection under variable illumination, while sensor data is transmitted via Bluetooth Low Energy (BLE) and WebSocket protocols for synchronization with external devices.</div><div>The methodological novelty of this work is highlighted as follows:</div><div>•High Accuracy Across Modalities: Achieved 3.4 mm localization accuracy, 85–91% orientation accuracy, and ∼2.9 mm hand pose keypoint accuracy, with trajectory fidelity maintained at 80–81%.</div><div>•Robust Finger-State Recognition: The capacitive sensing module consistently delivered 96.1% accuracy in detecting finger states across multiple runs.</div><div>•Validated Communication Trade-offs: Latency testing established complementary roles of Wi-Fi (high throughput, ∼467 msg/s) and BLE (low latency, ∼50 ms, >98% reliability) for real-time applications.</div><div>By fusing multiple sensing modalities, the method delivers enhanced accuracy, responsiveness, and stability while minimizing computational overhead. The system provides a reproducible, modular, and scalable solution suitable for VR/AR interaction, assistive technology, education, and human–computer interaction.</div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"15 ","pages":"Article 103609"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125004534","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a cost-effective hybrid hand-tracking technique that integrates fiducial marker detection, capacitive touch sensing, and inertial measurement for real-time gesture recognition in immersive environments. The system is implemented on lightweight hardware comprising a Raspberry Pi Zero 2 W and an ESP32, with OpenCV’s ArUco marker detection enabling 3D hand pose estimation, capacitive sensors supporting finger-state recognition, and an Inertial Measurement Unit (IMU) providing orientation tracking. Optimizations such as exposure adjustment and region-of-interest processing ensure robust marker detection under variable illumination, while sensor data is transmitted via Bluetooth Low Energy (BLE) and WebSocket protocols for synchronization with external devices.
The methodological novelty of this work is highlighted as follows:
•High Accuracy Across Modalities: Achieved 3.4 mm localization accuracy, 85–91% orientation accuracy, and ∼2.9 mm hand pose keypoint accuracy, with trajectory fidelity maintained at 80–81%.
•Robust Finger-State Recognition: The capacitive sensing module consistently delivered 96.1% accuracy in detecting finger states across multiple runs.
•Validated Communication Trade-offs: Latency testing established complementary roles of Wi-Fi (high throughput, ∼467 msg/s) and BLE (low latency, ∼50 ms, >98% reliability) for real-time applications.
By fusing multiple sensing modalities, the method delivers enhanced accuracy, responsiveness, and stability while minimizing computational overhead. The system provides a reproducible, modular, and scalable solution suitable for VR/AR interaction, assistive technology, education, and human–computer interaction.