Duidi Wu , Pai Zheng , Qianyou Zhao , Shuo Zhang , Jin Qi , Jie Hu , Guo-Niu Zhu , Lihui Wang
{"title":"通过多模态语言模型和空间智能增强自然人机协作:途径和观点","authors":"Duidi Wu , Pai Zheng , Qianyou Zhao , Shuo Zhang , Jin Qi , Jie Hu , Guo-Niu Zhu , Lihui Wang","doi":"10.1016/j.rcim.2025.103064","DOIUrl":null,"url":null,"abstract":"<div><div>Industry 5.0 advocates human-centric smart manufacturing (HSM), with growing attention to proactive human-machine collaboration (HRC). Meanwhile, the rapid development of Multimodal large language models (MLLMs) and embodied intelligence is driving an unprecedented evolution. This work aims to leverage these opportunities to enhance robots’ learning and cognitive capabilities, enabling seamless and natural interaction. However, current research often overlooks human–robot symbiosis and lacks attention to specialized models and practical applications. This review adheres to a human-centric vision, taking language as the pivot to connect humans with large models. To our best knowledge, this is the first attempt to integrate HRC, MLLMs and embodied intelligence into a holistic view. The review first introduces representative foundation models to provide a comprehensive summary of state-of-the-art methods in the ”Perception-Cognition-Actuation” loop. It then discusses pathways and platforms for efficient spatial skills learning, followed by an analysis of four key questions from the ”Why, How, What, Where” perspectives. Finally, it highlights future challenges and potential research directions. It is hoped that this work can help fill the research gap between HRC and MLLMs, offering a systematic pathway for developing human-centered collaborative systems and promoting further exploration and innovation in this exciting and crucial field. The resources are available at: <span><span>https://github.com/WuDuidi/MLLM-HRC-Survey</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"97 ","pages":"Article 103064"},"PeriodicalIF":11.4000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Empowering natural human–robot collaboration through multimodal language models and spatial intelligence: Pathways and perspectives\",\"authors\":\"Duidi Wu , Pai Zheng , Qianyou Zhao , Shuo Zhang , Jin Qi , Jie Hu , Guo-Niu Zhu , Lihui Wang\",\"doi\":\"10.1016/j.rcim.2025.103064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Industry 5.0 advocates human-centric smart manufacturing (HSM), with growing attention to proactive human-machine collaboration (HRC). Meanwhile, the rapid development of Multimodal large language models (MLLMs) and embodied intelligence is driving an unprecedented evolution. This work aims to leverage these opportunities to enhance robots’ learning and cognitive capabilities, enabling seamless and natural interaction. However, current research often overlooks human–robot symbiosis and lacks attention to specialized models and practical applications. This review adheres to a human-centric vision, taking language as the pivot to connect humans with large models. To our best knowledge, this is the first attempt to integrate HRC, MLLMs and embodied intelligence into a holistic view. The review first introduces representative foundation models to provide a comprehensive summary of state-of-the-art methods in the ”Perception-Cognition-Actuation” loop. It then discusses pathways and platforms for efficient spatial skills learning, followed by an analysis of four key questions from the ”Why, How, What, Where” perspectives. Finally, it highlights future challenges and potential research directions. It is hoped that this work can help fill the research gap between HRC and MLLMs, offering a systematic pathway for developing human-centered collaborative systems and promoting further exploration and innovation in this exciting and crucial field. The resources are available at: <span><span>https://github.com/WuDuidi/MLLM-HRC-Survey</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":21452,\"journal\":{\"name\":\"Robotics and Computer-integrated Manufacturing\",\"volume\":\"97 \",\"pages\":\"Article 103064\"},\"PeriodicalIF\":11.4000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Computer-integrated Manufacturing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0736584525001188\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584525001188","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
摘要
工业5.0倡导以人为中心的智能制造(HSM),越来越关注主动人机协作(HRC)。与此同时,多模态大语言模型(Multimodal large language models, mllm)和具身智能(embodied intelligence)的快速发展正在推动一场前所未有的进化。这项工作旨在利用这些机会来增强机器人的学习和认知能力,实现无缝和自然的交互。然而,目前的研究往往忽视了人机共生,缺乏对专业模型和实际应用的关注。这篇综述坚持以人为中心的观点,将语言作为连接人类与大模型的枢纽。据我们所知,这是第一次尝试将HRC、mlms和具身智力整合成一个整体的观点。本文首先介绍了具有代表性的基础模型,对“感知-认知-驱动”循环中最先进的方法进行了全面总结。然后讨论了高效空间技能学习的途径和平台,然后从“为什么、如何、什么、在哪里”的角度分析了四个关键问题。最后,指出了未来面临的挑战和潜在的研究方向。希望本工作能够填补HRC与mlms之间的研究空白,为开发以人为中心的协作系统提供系统途径,推动这一激动人心的关键领域的进一步探索和创新。这些资源可在https://github.com/WuDuidi/MLLM-HRC-Survey上获得。
Empowering natural human–robot collaboration through multimodal language models and spatial intelligence: Pathways and perspectives
Industry 5.0 advocates human-centric smart manufacturing (HSM), with growing attention to proactive human-machine collaboration (HRC). Meanwhile, the rapid development of Multimodal large language models (MLLMs) and embodied intelligence is driving an unprecedented evolution. This work aims to leverage these opportunities to enhance robots’ learning and cognitive capabilities, enabling seamless and natural interaction. However, current research often overlooks human–robot symbiosis and lacks attention to specialized models and practical applications. This review adheres to a human-centric vision, taking language as the pivot to connect humans with large models. To our best knowledge, this is the first attempt to integrate HRC, MLLMs and embodied intelligence into a holistic view. The review first introduces representative foundation models to provide a comprehensive summary of state-of-the-art methods in the ”Perception-Cognition-Actuation” loop. It then discusses pathways and platforms for efficient spatial skills learning, followed by an analysis of four key questions from the ”Why, How, What, Where” perspectives. Finally, it highlights future challenges and potential research directions. It is hoped that this work can help fill the research gap between HRC and MLLMs, offering a systematic pathway for developing human-centered collaborative systems and promoting further exploration and innovation in this exciting and crucial field. The resources are available at: https://github.com/WuDuidi/MLLM-HRC-Survey.
期刊介绍:
The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.