基于大语言模型的人机协同视觉检测

IF 11.4 1区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Robotics and Computer-integrated Manufacturing Pub Date : 2025-10-07 DOI:10.1016/j.rcim.2025.103154

Osama Tasneem, Roel Pieters

{"title":"基于大语言模型的人机协同视觉检测","authors":"Osama Tasneem, Roel Pieters","doi":"10.1016/j.rcim.2025.103154","DOIUrl":null,"url":null,"abstract":"Human–Robot Collaboration (HRC) is gaining traction in advanced manufacturing as industries shift from isolated robotic systems to more collaborative environments. This transition is supported by advancements in automation and more recently, Generative AI. Large Language Models (LLMs) offer new possibilities for intuitive human–robot interaction through natural language. However, the use of natural language as a means remains very limited due to the ambiguous natural language, environmental noise, pronunciation variability, and multiple phrasing styles. Furthermore, cloud-based deployment of LLMs raises concerns about ergonomics and data privacy, especially for industries and countries governed by strict regulatory requirements. To address these challenges, we present a fully offline, closed-loop robotic assistant for visual inspection tasks in HRC settings. The system supports speech-based interaction, where user instructions are transcribed via a Speech-to-Text (STT) model and processed by a locally deployed, code-generating LLM. Guided by a structured prompt, the LLM produces custom responses for robot perception and manipulation. Inspection paths are generated relative to spatial axes or in specific directions and executed with real-time feedback through a Text-to-Speech (TTS) interface, allowing for a much closer interaction with the robot assistant. The system applies a hybrid control method, where the higher-level instructions are generated by LLM along with a perception pipeline, and the lower-level robot control is managed by ROS for safety and reliability. The system is evaluated across a range of experiments, including local LLM comparisons, prompt engineering effectiveness, and inspection performance in both simulated and real-world industrial use cases. Results demonstrate the system’s capability to handle complex inspection tasks on objects with varied sizes and geometries, confirming its practicality and robustness in realistic deployment settings. Code and videos are open-source available at: <ce:inter-ref xlink:href=\"https://github.com/CuriousLad1000/RoboSpection\" xlink:type=\"simple\"><ce:italic>https://github.com/CuriousLad1000/RoboSpection</ce:italic></ce:inter-ref><ce:italic>.</ce:italic>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"36 1","pages":""},"PeriodicalIF":11.4000,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Human–robot collaborative visual inspection with Large Language Models\",\"authors\":\"Osama Tasneem, Roel Pieters\",\"doi\":\"10.1016/j.rcim.2025.103154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human–Robot Collaboration (HRC) is gaining traction in advanced manufacturing as industries shift from isolated robotic systems to more collaborative environments. This transition is supported by advancements in automation and more recently, Generative AI. Large Language Models (LLMs) offer new possibilities for intuitive human–robot interaction through natural language. However, the use of natural language as a means remains very limited due to the ambiguous natural language, environmental noise, pronunciation variability, and multiple phrasing styles. Furthermore, cloud-based deployment of LLMs raises concerns about ergonomics and data privacy, especially for industries and countries governed by strict regulatory requirements. To address these challenges, we present a fully offline, closed-loop robotic assistant for visual inspection tasks in HRC settings. The system supports speech-based interaction, where user instructions are transcribed via a Speech-to-Text (STT) model and processed by a locally deployed, code-generating LLM. Guided by a structured prompt, the LLM produces custom responses for robot perception and manipulation. Inspection paths are generated relative to spatial axes or in specific directions and executed with real-time feedback through a Text-to-Speech (TTS) interface, allowing for a much closer interaction with the robot assistant. The system applies a hybrid control method, where the higher-level instructions are generated by LLM along with a perception pipeline, and the lower-level robot control is managed by ROS for safety and reliability. The system is evaluated across a range of experiments, including local LLM comparisons, prompt engineering effectiveness, and inspection performance in both simulated and real-world industrial use cases. Results demonstrate the system’s capability to handle complex inspection tasks on objects with varied sizes and geometries, confirming its practicality and robustness in realistic deployment settings. Code and videos are open-source available at: <ce:inter-ref xlink:href=\\\"https://github.com/CuriousLad1000/RoboSpection\\\" xlink:type=\\\"simple\\\"><ce:italic>https://github.com/CuriousLad1000/RoboSpection</ce:italic></ce:inter-ref><ce:italic>.</ce:italic>\",\"PeriodicalId\":21452,\"journal\":{\"name\":\"Robotics and Computer-integrated Manufacturing\",\"volume\":\"36 1\",\"pages\":\"\"},\"PeriodicalIF\":11.4000,\"publicationDate\":\"2025-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Computer-integrated Manufacturing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1016/j.rcim.2025.103154\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.rcim.2025.103154","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

随着工业从孤立的机器人系统转向更具协作性的环境，人机协作（HRC）在先进制造业中越来越受到关注。这种转变得到了自动化进步和最近的生成式人工智能的支持。大型语言模型（llm）通过自然语言为直观的人机交互提供了新的可能性。然而，由于自然语言的模糊性、环境噪声、发音的可变性和多种措辞风格，自然语言作为一种手段的使用仍然非常有限。此外，基于云的法学硕士部署引发了对人体工程学和数据隐私的担忧，特别是对于那些受到严格监管要求的行业和国家。为了解决这些挑战，我们提出了一个完全离线的闭环机器人助手，用于HRC设置中的视觉检查任务。该系统支持基于语音的交互，其中用户指令通过语音到文本（STT）模型转录，并由本地部署的代码生成LLM进行处理。在结构化提示的指导下，LLM为机器人感知和操作产生自定义响应。检测路径相对于空间轴或特定方向生成，并通过文本到语音（TTS）界面实时反馈执行，从而允许与机器人助手进行更密切的交互。该系统采用混合控制方法，其中高级指令由LLM生成并结合感知管道，低级机器人控制由ROS管理，以保证安全性和可靠性。该系统通过一系列实验进行评估，包括本地LLM比较、快速工程效率以及模拟和现实工业用例中的检测性能。结果表明，该系统能够处理不同尺寸和几何形状物体的复杂检测任务，证实了其在实际部署环境中的实用性和鲁棒性。代码和视频是开源的：https://github.com/CuriousLad1000/RoboSpection。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Human–robot collaborative visual inspection with Large Language Models

Human–Robot Collaboration (HRC) is gaining traction in advanced manufacturing as industries shift from isolated robotic systems to more collaborative environments. This transition is supported by advancements in automation and more recently, Generative AI. Large Language Models (LLMs) offer new possibilities for intuitive human–robot interaction through natural language. However, the use of natural language as a means remains very limited due to the ambiguous natural language, environmental noise, pronunciation variability, and multiple phrasing styles. Furthermore, cloud-based deployment of LLMs raises concerns about ergonomics and data privacy, especially for industries and countries governed by strict regulatory requirements. To address these challenges, we present a fully offline, closed-loop robotic assistant for visual inspection tasks in HRC settings. The system supports speech-based interaction, where user instructions are transcribed via a Speech-to-Text (STT) model and processed by a locally deployed, code-generating LLM. Guided by a structured prompt, the LLM produces custom responses for robot perception and manipulation. Inspection paths are generated relative to spatial axes or in specific directions and executed with real-time feedback through a Text-to-Speech (TTS) interface, allowing for a much closer interaction with the robot assistant. The system applies a hybrid control method, where the higher-level instructions are generated by LLM along with a perception pipeline, and the lower-level robot control is managed by ROS for safety and reliability. The system is evaluated across a range of experiments, including local LLM comparisons, prompt engineering effectiveness, and inspection performance in both simulated and real-world industrial use cases. Results demonstrate the system’s capability to handle complex inspection tasks on objects with varied sizes and geometries, confirming its practicality and robustness in realistic deployment settings. Code and videos are open-source available at: https://github.com/CuriousLad1000/RoboSpection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics and Computer-integrated Manufacturing 工程技术-工程：制造

CiteScore

24.10

自引率

13.50%

发文量

160

审稿时长

50 days

期刊介绍： The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.