人机协同制造中人机动作识别的RGB视频与惯性传感融合方法

IF 14.2 1区工程技术 Q1 ENGINEERING, INDUSTRIAL

Journal of Manufacturing Systems Pub Date : 2025-09-16 DOI:10.1016/j.jmsy.2025.09.007

Lili Dong , Tianliang Hu , Tianyi Sun , Junrui Li , Songhua Ma

{"title":"人机协同制造中人机动作识别的RGB视频与惯性传感融合方法","authors":"Lili Dong , Tianliang Hu , Tianyi Sun , Junrui Li , Songhua Ma","doi":"10.1016/j.jmsy.2025.09.007","DOIUrl":null,"url":null,"abstract":"<div><div>Human action recognition (HAR), as a prerequisite for robotic dynamic decision-making, is crucial for achieving efficient human-robot collaborative manufacturing (HRCM). Compared with single modality, multi-modality provides a more comprehensive understanding of human actions. However, it is a challenge to effectively integrate this information to fully leverage the advantages of multi-modality for HAR in HRCM. Therefore, in this paper, the RGB video and inertial sensing fusion method for HAR in HRCM is proposed, presenting the systematic exploration of this multi-modality in industrial contexts. Two fusion strategies of two modalities are studied: decision-level fusion and feature-level fusion. Secondly, taking the rotary vector (RV) reducer assembly as an example, a multi-modal human assembly action dataset for HAR (HAAD-SDU) is designed, filling the gap in the HRCM field where publicly representative datasets are scarce. This dataset synchronously introduces RGB video and inertial sensing data containing human assembly information. Finally, the feasibility and effectiveness of the proposed approach are verified by the designed dataset and public dataset, demonstrating superior performance over baseline methods. The experimental results demonstrate that the proposed fusion approach integrating RGB video and inertial sensing modalities not only overcomes the limitations of the single modality but also exhibits strong cross-domain generalizability, proving effective for both industrial tasks and daily activity recognition. In the HRCM scenario specifically, both decision-level and feature-level fusion strategies demonstrate superior recognition capabilities. The decision-level fusion provides a higher recognition accuracy of 95.71 %, while the feature-level fusion achieves competitive accuracy at 94.42 % with low recognition latency of 1.67 s. Notably, the proposed fusion model can accurately recognize human behaviors at least 2 s before they are completed, providing sufficient leftover time for the robotic system to complete collaborative tasks.</div></div>","PeriodicalId":16227,"journal":{"name":"Journal of Manufacturing Systems","volume":"83 ","pages":"Pages 216-234"},"PeriodicalIF":14.2000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RGB video and inertial sensing fusion method for human action recognition in human-robot collaborative manufacturing\",\"authors\":\"Lili Dong , Tianliang Hu , Tianyi Sun , Junrui Li , Songhua Ma\",\"doi\":\"10.1016/j.jmsy.2025.09.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Human action recognition (HAR), as a prerequisite for robotic dynamic decision-making, is crucial for achieving efficient human-robot collaborative manufacturing (HRCM). Compared with single modality, multi-modality provides a more comprehensive understanding of human actions. However, it is a challenge to effectively integrate this information to fully leverage the advantages of multi-modality for HAR in HRCM. Therefore, in this paper, the RGB video and inertial sensing fusion method for HAR in HRCM is proposed, presenting the systematic exploration of this multi-modality in industrial contexts. Two fusion strategies of two modalities are studied: decision-level fusion and feature-level fusion. Secondly, taking the rotary vector (RV) reducer assembly as an example, a multi-modal human assembly action dataset for HAR (HAAD-SDU) is designed, filling the gap in the HRCM field where publicly representative datasets are scarce. This dataset synchronously introduces RGB video and inertial sensing data containing human assembly information. Finally, the feasibility and effectiveness of the proposed approach are verified by the designed dataset and public dataset, demonstrating superior performance over baseline methods. The experimental results demonstrate that the proposed fusion approach integrating RGB video and inertial sensing modalities not only overcomes the limitations of the single modality but also exhibits strong cross-domain generalizability, proving effective for both industrial tasks and daily activity recognition. In the HRCM scenario specifically, both decision-level and feature-level fusion strategies demonstrate superior recognition capabilities. The decision-level fusion provides a higher recognition accuracy of 95.71 %, while the feature-level fusion achieves competitive accuracy at 94.42 % with low recognition latency of 1.67 s. Notably, the proposed fusion model can accurately recognize human behaviors at least 2 s before they are completed, providing sufficient leftover time for the robotic system to complete collaborative tasks.</div></div>\",\"PeriodicalId\":16227,\"journal\":{\"name\":\"Journal of Manufacturing Systems\",\"volume\":\"83 \",\"pages\":\"Pages 216-234\"},\"PeriodicalIF\":14.2000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Manufacturing Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0278612525002341\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, INDUSTRIAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Manufacturing Systems","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0278612525002341","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 0

摘要

人的动作识别作为机器人动态决策的前提，是实现高效人机协同制造的关键。与单一模态相比，多模态提供了对人类行为更全面的理解。然而，如何有效地整合这些信息以充分利用多模态HAR在HRCM中的优势是一个挑战。因此，本文提出了HRCM中HAR的RGB视频和惯性传感融合方法，对这种多模态在工业背景下进行了系统探索。研究了两种模式的融合策略：决策级融合和特征级融合。其次，以回转矢量（RV）减速器装配为例，设计了面向HAR的多模态人体装配动作数据集（HAAD-SDU），填补了HRCM领域缺乏公开代表性数据集的空白。该数据集同步引入包含人体装配信息的RGB视频和惯性传感数据。最后，通过设计的数据集和公共数据集验证了该方法的可行性和有效性，显示出优于基线方法的性能。实验结果表明，所提出的融合RGB视频和惯性传感模式的方法不仅克服了单一模式的局限性，而且具有较强的跨域泛化能力，对工业任务和日常活动识别都是有效的。具体来说，在HRCM场景中，决策级和特征级融合策略都表现出卓越的识别能力。决策级融合的识别准确率为95.71 %，而特征级融合的竞争准确率为94.42 %，识别延迟低，为1.67 s。值得注意的是，所提出的融合模型可以在人类行为完成前至少2 秒准确识别人类行为，为机器人系统完成协作任务提供了足够的剩余时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

RGB video and inertial sensing fusion method for human action recognition in human-robot collaborative manufacturing

Human action recognition (HAR), as a prerequisite for robotic dynamic decision-making, is crucial for achieving efficient human-robot collaborative manufacturing (HRCM). Compared with single modality, multi-modality provides a more comprehensive understanding of human actions. However, it is a challenge to effectively integrate this information to fully leverage the advantages of multi-modality for HAR in HRCM. Therefore, in this paper, the RGB video and inertial sensing fusion method for HAR in HRCM is proposed, presenting the systematic exploration of this multi-modality in industrial contexts. Two fusion strategies of two modalities are studied: decision-level fusion and feature-level fusion. Secondly, taking the rotary vector (RV) reducer assembly as an example, a multi-modal human assembly action dataset for HAR (HAAD-SDU) is designed, filling the gap in the HRCM field where publicly representative datasets are scarce. This dataset synchronously introduces RGB video and inertial sensing data containing human assembly information. Finally, the feasibility and effectiveness of the proposed approach are verified by the designed dataset and public dataset, demonstrating superior performance over baseline methods. The experimental results demonstrate that the proposed fusion approach integrating RGB video and inertial sensing modalities not only overcomes the limitations of the single modality but also exhibits strong cross-domain generalizability, proving effective for both industrial tasks and daily activity recognition. In the HRCM scenario specifically, both decision-level and feature-level fusion strategies demonstrate superior recognition capabilities. The decision-level fusion provides a higher recognition accuracy of 95.71 %, while the feature-level fusion achieves competitive accuracy at 94.42 % with low recognition latency of 1.67 s. Notably, the proposed fusion model can accurately recognize human behaviors at least 2 s before they are completed, providing sufficient leftover time for the robotic system to complete collaborative tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Manufacturing Systems 工程技术-工程：工业

CiteScore

23.30

自引率

13.20%

发文量

216

审稿时长

25 days

期刊介绍： The Journal of Manufacturing Systems is dedicated to showcasing cutting-edge fundamental and applied research in manufacturing at the systems level. Encompassing products, equipment, people, information, control, and support functions, manufacturing systems play a pivotal role in the economical and competitive development, production, delivery, and total lifecycle of products, meeting market and societal needs. With a commitment to publishing archival scholarly literature, the journal strives to advance the state of the art in manufacturing systems and foster innovation in crafting efficient, robust, and sustainable manufacturing systems. The focus extends from equipment-level considerations to the broader scope of the extended enterprise. The Journal welcomes research addressing challenges across various scales, including nano, micro, and macro-scale manufacturing, and spanning diverse sectors such as aerospace, automotive, energy, and medical device manufacturing.