用于主动式人机协作装配的高效数据多模态人类动作识别:跨领域少量学习方法

IF 9.1 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Tianyu Wang , Zhihao Liu , Lihui Wang , Mian Li , Xi Vincent Wang
{"title":"用于主动式人机协作装配的高效数据多模态人类动作识别:跨领域少量学习方法","authors":"Tianyu Wang ,&nbsp;Zhihao Liu ,&nbsp;Lihui Wang ,&nbsp;Mian Li ,&nbsp;Xi Vincent Wang","doi":"10.1016/j.rcim.2024.102785","DOIUrl":null,"url":null,"abstract":"<div><p>With the recent vision of Industry 5.0, the cognitive capability of robots plays a crucial role in advancing proactive human–robot collaborative assembly. As a basis of the mutual empathy, the understanding of a human operator’s intention has been primarily studied through the technique of human action recognition. Existing deep learning-based methods demonstrate remarkable efficacy in handling information-rich data such as physiological measurements and videos, where the latter category represents a more natural perception input. However, deploying these methods in new unseen assembly scenarios requires first collecting abundant case-specific data. This leads to significant manual effort and poor flexibility. To deal with the issue, this paper proposes a novel cross-domain few-shot learning method for data-efficient multimodal human action recognition. A hierarchical data fusion mechanism is designed to jointly leverage the skeletons, RGB images and depth maps with complementary information. Then a temporal CrossTransformer is developed to enable the action recognition with very limited amount of data. Lightweight domain adapters are integrated to further improve the generalization with fast finetuning. Extensive experiments on a real car engine assembly case show the superior performance of proposed method over state-of-the-art regarding both accuracy and finetuning efficiency. Real-time demonstrations and ablation study further indicate the potential of early recognition, which is beneficial for the robot procedures generation in practical applications. In summary, this paper contributes to the rarely explored realm of data-efficient human action recognition for proactive human–robot collaboration.</p></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"89 ","pages":"Article 102785"},"PeriodicalIF":9.1000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0736584524000723/pdfft?md5=9f803ee00964b9e87f8d4fdc2e293a33&pid=1-s2.0-S0736584524000723-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Data-efficient multimodal human action recognition for proactive human–robot collaborative assembly: A cross-domain few-shot learning approach\",\"authors\":\"Tianyu Wang ,&nbsp;Zhihao Liu ,&nbsp;Lihui Wang ,&nbsp;Mian Li ,&nbsp;Xi Vincent Wang\",\"doi\":\"10.1016/j.rcim.2024.102785\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>With the recent vision of Industry 5.0, the cognitive capability of robots plays a crucial role in advancing proactive human–robot collaborative assembly. As a basis of the mutual empathy, the understanding of a human operator’s intention has been primarily studied through the technique of human action recognition. Existing deep learning-based methods demonstrate remarkable efficacy in handling information-rich data such as physiological measurements and videos, where the latter category represents a more natural perception input. However, deploying these methods in new unseen assembly scenarios requires first collecting abundant case-specific data. This leads to significant manual effort and poor flexibility. To deal with the issue, this paper proposes a novel cross-domain few-shot learning method for data-efficient multimodal human action recognition. A hierarchical data fusion mechanism is designed to jointly leverage the skeletons, RGB images and depth maps with complementary information. Then a temporal CrossTransformer is developed to enable the action recognition with very limited amount of data. Lightweight domain adapters are integrated to further improve the generalization with fast finetuning. Extensive experiments on a real car engine assembly case show the superior performance of proposed method over state-of-the-art regarding both accuracy and finetuning efficiency. Real-time demonstrations and ablation study further indicate the potential of early recognition, which is beneficial for the robot procedures generation in practical applications. In summary, this paper contributes to the rarely explored realm of data-efficient human action recognition for proactive human–robot collaboration.</p></div>\",\"PeriodicalId\":21452,\"journal\":{\"name\":\"Robotics and Computer-integrated Manufacturing\",\"volume\":\"89 \",\"pages\":\"Article 102785\"},\"PeriodicalIF\":9.1000,\"publicationDate\":\"2024-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0736584524000723/pdfft?md5=9f803ee00964b9e87f8d4fdc2e293a33&pid=1-s2.0-S0736584524000723-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Computer-integrated Manufacturing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0736584524000723\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584524000723","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

随着近年来工业 5.0 愿景的提出,机器人的认知能力在推进主动式人机协作装配方面发挥着至关重要的作用。作为相互共鸣的基础,对人类操作员意图的理解主要通过人类动作识别技术进行研究。现有的基于深度学习的方法在处理生理测量和视频等信息丰富的数据时表现出了显著的功效,其中视频代表了更自然的感知输入。然而,在新的未见装配场景中部署这些方法需要首先收集丰富的特定案例数据。这将导致大量的人工工作和较差的灵活性。为了解决这个问题,本文提出了一种新颖的跨域少量学习方法,用于数据高效的多模态人体动作识别。本文设计了一种分层数据融合机制,以共同利用具有互补信息的骨架、RGB 图像和深度图。然后开发了一个时态交叉变换器,以便在数据量非常有限的情况下实现动作识别。此外,还集成了轻量级域适配器,通过快速微调进一步提高泛化能力。在真实的汽车发动机装配案例中进行的大量实验表明,所提出的方法在准确性和微调效率方面都优于最先进的方法。实时演示和烧蚀研究进一步表明了早期识别的潜力,这有利于在实际应用中生成机器人程序。总之,本文为主动式人机协作的数据高效人类动作识别这一鲜有探索的领域做出了贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Data-efficient multimodal human action recognition for proactive human–robot collaborative assembly: A cross-domain few-shot learning approach

With the recent vision of Industry 5.0, the cognitive capability of robots plays a crucial role in advancing proactive human–robot collaborative assembly. As a basis of the mutual empathy, the understanding of a human operator’s intention has been primarily studied through the technique of human action recognition. Existing deep learning-based methods demonstrate remarkable efficacy in handling information-rich data such as physiological measurements and videos, where the latter category represents a more natural perception input. However, deploying these methods in new unseen assembly scenarios requires first collecting abundant case-specific data. This leads to significant manual effort and poor flexibility. To deal with the issue, this paper proposes a novel cross-domain few-shot learning method for data-efficient multimodal human action recognition. A hierarchical data fusion mechanism is designed to jointly leverage the skeletons, RGB images and depth maps with complementary information. Then a temporal CrossTransformer is developed to enable the action recognition with very limited amount of data. Lightweight domain adapters are integrated to further improve the generalization with fast finetuning. Extensive experiments on a real car engine assembly case show the superior performance of proposed method over state-of-the-art regarding both accuracy and finetuning efficiency. Real-time demonstrations and ablation study further indicate the potential of early recognition, which is beneficial for the robot procedures generation in practical applications. In summary, this paper contributes to the rarely explored realm of data-efficient human action recognition for proactive human–robot collaboration.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Robotics and Computer-integrated Manufacturing
Robotics and Computer-integrated Manufacturing 工程技术-工程:制造
CiteScore
24.10
自引率
13.50%
发文量
160
审稿时长
50 days
期刊介绍: The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信