室内有限视场下主动多模态手势识别的实时人机交互

IF 5.3 2区 计算机科学 Q2 ROBOTICS
Weicheng Fang;Ganghua Lai;Yushu Yu;Chuanbeibei Shi;Xin Meng;Jiali Sun;Zhenchao Cui
{"title":"室内有限视场下主动多模态手势识别的实时人机交互","authors":"Weicheng Fang;Ganghua Lai;Yushu Yu;Chuanbeibei Shi;Xin Meng;Jiali Sun;Zhenchao Cui","doi":"10.1109/LRA.2025.3615031","DOIUrl":null,"url":null,"abstract":"Gesture recognition, an important method for Human-Drone Interaction (HDI), is often constrained by sensor limitations, such as sensitivity to lighting variations and field of view (FoV) restrictions. This letter proposes a real-time drone control system that integrates multimodal fusion gesture recognition with active perception control to overcome these challenges. We constructed a diversified arm gesture dataset and designed a lightweight, one-stage point cloud and image features fusion model, Adaptive Gate Fusion Network (AGFNet), for real-time inference on embedded devices. We applied motion compensation to mitigate delay errors caused by point clouds accumulation and network inference during movement, and fused the drone’s velocity data and detection results using the Extended Kalman Filter (EKF) to enhance real-time performance. This enabled active perception through the optimization of the sensor’s FoV using perception-aware Model Predictive Control (PAMPC). Experimental results demonstrate that the proposed model achieves a threefold improvement in inference speed compared to the baseline, reaching 97.93% mAP on the test set and outperforming single-sensor networks by approximately 5%. Real-world testing further confirms the system’s applicability and effectiveness in indoor environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"11705-11712"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-Time Human–Drone Interaction via Active Multimodal Gesture Recognition Under Limited Field of View in Indoor Environments\",\"authors\":\"Weicheng Fang;Ganghua Lai;Yushu Yu;Chuanbeibei Shi;Xin Meng;Jiali Sun;Zhenchao Cui\",\"doi\":\"10.1109/LRA.2025.3615031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gesture recognition, an important method for Human-Drone Interaction (HDI), is often constrained by sensor limitations, such as sensitivity to lighting variations and field of view (FoV) restrictions. This letter proposes a real-time drone control system that integrates multimodal fusion gesture recognition with active perception control to overcome these challenges. We constructed a diversified arm gesture dataset and designed a lightweight, one-stage point cloud and image features fusion model, Adaptive Gate Fusion Network (AGFNet), for real-time inference on embedded devices. We applied motion compensation to mitigate delay errors caused by point clouds accumulation and network inference during movement, and fused the drone’s velocity data and detection results using the Extended Kalman Filter (EKF) to enhance real-time performance. This enabled active perception through the optimization of the sensor’s FoV using perception-aware Model Predictive Control (PAMPC). Experimental results demonstrate that the proposed model achieves a threefold improvement in inference speed compared to the baseline, reaching 97.93% mAP on the test set and outperforming single-sensor networks by approximately 5%. Real-world testing further confirms the system’s applicability and effectiveness in indoor environments.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 11\",\"pages\":\"11705-11712\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11181055/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11181055/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

摘要

手势识别作为人机交互(HDI)的一种重要方法,经常受到传感器的限制,如对光线变化的敏感性和视野(FoV)的限制。本文提出了一种集成多模态融合手势识别和主动感知控制的实时无人机控制系统,以克服这些挑战。我们构建了一个多样化的手臂手势数据集,并设计了一个轻量级的、单阶段的点云和图像特征融合模型——自适应门融合网络(AGFNet),用于嵌入式设备的实时推理。采用运动补偿的方法减轻运动过程中点云积累和网络推理带来的延迟误差,并利用扩展卡尔曼滤波(EKF)融合无人机的速度数据和检测结果,提高实时性。通过使用感知模型预测控制(PAMPC)优化传感器的视场,实现了主动感知。实验结果表明,与基线相比,该模型的推理速度提高了三倍,在测试集上达到97.93%的mAP,比单传感器网络的推理速度提高了约5%。实际测试进一步证实了该系统在室内环境中的适用性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Real-Time Human–Drone Interaction via Active Multimodal Gesture Recognition Under Limited Field of View in Indoor Environments
Gesture recognition, an important method for Human-Drone Interaction (HDI), is often constrained by sensor limitations, such as sensitivity to lighting variations and field of view (FoV) restrictions. This letter proposes a real-time drone control system that integrates multimodal fusion gesture recognition with active perception control to overcome these challenges. We constructed a diversified arm gesture dataset and designed a lightweight, one-stage point cloud and image features fusion model, Adaptive Gate Fusion Network (AGFNet), for real-time inference on embedded devices. We applied motion compensation to mitigate delay errors caused by point clouds accumulation and network inference during movement, and fused the drone’s velocity data and detection results using the Extended Kalman Filter (EKF) to enhance real-time performance. This enabled active perception through the optimization of the sensor’s FoV using perception-aware Model Predictive Control (PAMPC). Experimental results demonstrate that the proposed model achieves a threefold improvement in inference speed compared to the baseline, reaching 97.93% mAP on the test set and outperforming single-sensor networks by approximately 5%. Real-world testing further confirms the system’s applicability and effectiveness in indoor environments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信