Weicheng Fang;Ganghua Lai;Yushu Yu;Chuanbeibei Shi;Xin Meng;Jiali Sun;Zhenchao Cui
{"title":"Real-Time Human–Drone Interaction via Active Multimodal Gesture Recognition Under Limited Field of View in Indoor Environments","authors":"Weicheng Fang;Ganghua Lai;Yushu Yu;Chuanbeibei Shi;Xin Meng;Jiali Sun;Zhenchao Cui","doi":"10.1109/LRA.2025.3615031","DOIUrl":null,"url":null,"abstract":"Gesture recognition, an important method for Human-Drone Interaction (HDI), is often constrained by sensor limitations, such as sensitivity to lighting variations and field of view (FoV) restrictions. This letter proposes a real-time drone control system that integrates multimodal fusion gesture recognition with active perception control to overcome these challenges. We constructed a diversified arm gesture dataset and designed a lightweight, one-stage point cloud and image features fusion model, Adaptive Gate Fusion Network (AGFNet), for real-time inference on embedded devices. We applied motion compensation to mitigate delay errors caused by point clouds accumulation and network inference during movement, and fused the drone’s velocity data and detection results using the Extended Kalman Filter (EKF) to enhance real-time performance. This enabled active perception through the optimization of the sensor’s FoV using perception-aware Model Predictive Control (PAMPC). Experimental results demonstrate that the proposed model achieves a threefold improvement in inference speed compared to the baseline, reaching 97.93% mAP on the test set and outperforming single-sensor networks by approximately 5%. Real-world testing further confirms the system’s applicability and effectiveness in indoor environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"11705-11712"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11181055/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Gesture recognition, an important method for Human-Drone Interaction (HDI), is often constrained by sensor limitations, such as sensitivity to lighting variations and field of view (FoV) restrictions. This letter proposes a real-time drone control system that integrates multimodal fusion gesture recognition with active perception control to overcome these challenges. We constructed a diversified arm gesture dataset and designed a lightweight, one-stage point cloud and image features fusion model, Adaptive Gate Fusion Network (AGFNet), for real-time inference on embedded devices. We applied motion compensation to mitigate delay errors caused by point clouds accumulation and network inference during movement, and fused the drone’s velocity data and detection results using the Extended Kalman Filter (EKF) to enhance real-time performance. This enabled active perception through the optimization of the sensor’s FoV using perception-aware Model Predictive Control (PAMPC). Experimental results demonstrate that the proposed model achieves a threefold improvement in inference speed compared to the baseline, reaching 97.93% mAP on the test set and outperforming single-sensor networks by approximately 5%. Real-world testing further confirms the system’s applicability and effectiveness in indoor environments.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.