Data-efficient multimodal human action recognition for proactive human–robot collaborative assembly: A cross-domain few-shot learning approach

IF 9.1 1区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Robotics and Computer-integrated Manufacturing Pub Date : 2024-05-15 DOI:10.1016/j.rcim.2024.102785

Tianyu Wang , Zhihao Liu , Lihui Wang , Mian Li , Xi Vincent Wang

{"title":"Data-efficient multimodal human action recognition for proactive human–robot collaborative assembly: A cross-domain few-shot learning approach","authors":"Tianyu Wang , Zhihao Liu , Lihui Wang , Mian Li , Xi Vincent Wang","doi":"10.1016/j.rcim.2024.102785","DOIUrl":null,"url":null,"abstract":"<div><p>With the recent vision of Industry 5.0, the cognitive capability of robots plays a crucial role in advancing proactive human–robot collaborative assembly. As a basis of the mutual empathy, the understanding of a human operator’s intention has been primarily studied through the technique of human action recognition. Existing deep learning-based methods demonstrate remarkable efficacy in handling information-rich data such as physiological measurements and videos, where the latter category represents a more natural perception input. However, deploying these methods in new unseen assembly scenarios requires first collecting abundant case-specific data. This leads to significant manual effort and poor flexibility. To deal with the issue, this paper proposes a novel cross-domain few-shot learning method for data-efficient multimodal human action recognition. A hierarchical data fusion mechanism is designed to jointly leverage the skeletons, RGB images and depth maps with complementary information. Then a temporal CrossTransformer is developed to enable the action recognition with very limited amount of data. Lightweight domain adapters are integrated to further improve the generalization with fast finetuning. Extensive experiments on a real car engine assembly case show the superior performance of proposed method over state-of-the-art regarding both accuracy and finetuning efficiency. Real-time demonstrations and ablation study further indicate the potential of early recognition, which is beneficial for the robot procedures generation in practical applications. In summary, this paper contributes to the rarely explored realm of data-efficient human action recognition for proactive human–robot collaboration.</p></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"89 ","pages":"Article 102785"},"PeriodicalIF":9.1000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0736584524000723/pdfft?md5=9f803ee00964b9e87f8d4fdc2e293a33&pid=1-s2.0-S0736584524000723-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584524000723","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

With the recent vision of Industry 5.0, the cognitive capability of robots plays a crucial role in advancing proactive human–robot collaborative assembly. As a basis of the mutual empathy, the understanding of a human operator’s intention has been primarily studied through the technique of human action recognition. Existing deep learning-based methods demonstrate remarkable efficacy in handling information-rich data such as physiological measurements and videos, where the latter category represents a more natural perception input. However, deploying these methods in new unseen assembly scenarios requires first collecting abundant case-specific data. This leads to significant manual effort and poor flexibility. To deal with the issue, this paper proposes a novel cross-domain few-shot learning method for data-efficient multimodal human action recognition. A hierarchical data fusion mechanism is designed to jointly leverage the skeletons, RGB images and depth maps with complementary information. Then a temporal CrossTransformer is developed to enable the action recognition with very limited amount of data. Lightweight domain adapters are integrated to further improve the generalization with fast finetuning. Extensive experiments on a real car engine assembly case show the superior performance of proposed method over state-of-the-art regarding both accuracy and finetuning efficiency. Real-time demonstrations and ablation study further indicate the potential of early recognition, which is beneficial for the robot procedures generation in practical applications. In summary, this paper contributes to the rarely explored realm of data-efficient human action recognition for proactive human–robot collaboration.

查看原文本刊更多论文

用于主动式人机协作装配的高效数据多模态人类动作识别：跨领域少量学习方法

随着近年来工业 5.0 愿景的提出，机器人的认知能力在推进主动式人机协作装配方面发挥着至关重要的作用。作为相互共鸣的基础，对人类操作员意图的理解主要通过人类动作识别技术进行研究。现有的基于深度学习的方法在处理生理测量和视频等信息丰富的数据时表现出了显著的功效，其中视频代表了更自然的感知输入。然而，在新的未见装配场景中部署这些方法需要首先收集丰富的特定案例数据。这将导致大量的人工工作和较差的灵活性。为了解决这个问题，本文提出了一种新颖的跨域少量学习方法，用于数据高效的多模态人体动作识别。本文设计了一种分层数据融合机制，以共同利用具有互补信息的骨架、RGB 图像和深度图。然后开发了一个时态交叉变换器，以便在数据量非常有限的情况下实现动作识别。此外，还集成了轻量级域适配器，通过快速微调进一步提高泛化能力。在真实的汽车发动机装配案例中进行的大量实验表明，所提出的方法在准确性和微调效率方面都优于最先进的方法。实时演示和烧蚀研究进一步表明了早期识别的潜力，这有利于在实际应用中生成机器人程序。总之，本文为主动式人机协作的数据高效人类动作识别这一鲜有探索的领域做出了贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Robotics and Computer-integrated Manufacturing 工程技术-工程：制造

CiteScore

24.10

自引率

13.50%

发文量

160

审稿时长

50 days

期刊介绍： The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.