Real-Time Human–Drone Interaction via Active Multimodal Gesture Recognition Under Limited Field of View in Indoor Environments

IF 5.3 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-09-26 DOI:10.1109/LRA.2025.3615031

Weicheng Fang;Ganghua Lai;Yushu Yu;Chuanbeibei Shi;Xin Meng;Jiali Sun;Zhenchao Cui

{"title":"Real-Time Human–Drone Interaction via Active Multimodal Gesture Recognition Under Limited Field of View in Indoor Environments","authors":"Weicheng Fang;Ganghua Lai;Yushu Yu;Chuanbeibei Shi;Xin Meng;Jiali Sun;Zhenchao Cui","doi":"10.1109/LRA.2025.3615031","DOIUrl":null,"url":null,"abstract":"Gesture recognition, an important method for Human-Drone Interaction (HDI), is often constrained by sensor limitations, such as sensitivity to lighting variations and field of view (FoV) restrictions. This letter proposes a real-time drone control system that integrates multimodal fusion gesture recognition with active perception control to overcome these challenges. We constructed a diversified arm gesture dataset and designed a lightweight, one-stage point cloud and image features fusion model, Adaptive Gate Fusion Network (AGFNet), for real-time inference on embedded devices. We applied motion compensation to mitigate delay errors caused by point clouds accumulation and network inference during movement, and fused the drone’s velocity data and detection results using the Extended Kalman Filter (EKF) to enhance real-time performance. This enabled active perception through the optimization of the sensor’s FoV using perception-aware Model Predictive Control (PAMPC). Experimental results demonstrate that the proposed model achieves a threefold improvement in inference speed compared to the baseline, reaching 97.93% mAP on the test set and outperforming single-sensor networks by approximately 5%. Real-world testing further confirms the system’s applicability and effectiveness in indoor environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"11705-11712"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11181055/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Gesture recognition, an important method for Human-Drone Interaction (HDI), is often constrained by sensor limitations, such as sensitivity to lighting variations and field of view (FoV) restrictions. This letter proposes a real-time drone control system that integrates multimodal fusion gesture recognition with active perception control to overcome these challenges. We constructed a diversified arm gesture dataset and designed a lightweight, one-stage point cloud and image features fusion model, Adaptive Gate Fusion Network (AGFNet), for real-time inference on embedded devices. We applied motion compensation to mitigate delay errors caused by point clouds accumulation and network inference during movement, and fused the drone’s velocity data and detection results using the Extended Kalman Filter (EKF) to enhance real-time performance. This enabled active perception through the optimization of the sensor’s FoV using perception-aware Model Predictive Control (PAMPC). Experimental results demonstrate that the proposed model achieves a threefold improvement in inference speed compared to the baseline, reaching 97.93% mAP on the test set and outperforming single-sensor networks by approximately 5%. Real-world testing further confirms the system’s applicability and effectiveness in indoor environments.

查看原文本刊更多论文

室内有限视场下主动多模态手势识别的实时人机交互

手势识别作为人机交互（HDI）的一种重要方法，经常受到传感器的限制，如对光线变化的敏感性和视野（FoV）的限制。本文提出了一种集成多模态融合手势识别和主动感知控制的实时无人机控制系统，以克服这些挑战。我们构建了一个多样化的手臂手势数据集，并设计了一个轻量级的、单阶段的点云和图像特征融合模型——自适应门融合网络（AGFNet），用于嵌入式设备的实时推理。采用运动补偿的方法减轻运动过程中点云积累和网络推理带来的延迟误差，并利用扩展卡尔曼滤波（EKF）融合无人机的速度数据和检测结果，提高实时性。通过使用感知模型预测控制（PAMPC）优化传感器的视场，实现了主动感知。实验结果表明，与基线相比，该模型的推理速度提高了三倍，在测试集上达到97.93%的mAP，比单传感器网络的推理速度提高了约5%。实际测试进一步证实了该系统在室内环境中的适用性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.