{"title":"基于大语言模型的人机协同视觉检测","authors":"Osama Tasneem, Roel Pieters","doi":"10.1016/j.rcim.2025.103154","DOIUrl":null,"url":null,"abstract":"Human–Robot Collaboration (HRC) is gaining traction in advanced manufacturing as industries shift from isolated robotic systems to more collaborative environments. This transition is supported by advancements in automation and more recently, Generative AI. Large Language Models (LLMs) offer new possibilities for intuitive human–robot interaction through natural language. However, the use of natural language as a means remains very limited due to the ambiguous natural language, environmental noise, pronunciation variability, and multiple phrasing styles. Furthermore, cloud-based deployment of LLMs raises concerns about ergonomics and data privacy, especially for industries and countries governed by strict regulatory requirements. To address these challenges, we present a fully offline, closed-loop robotic assistant for visual inspection tasks in HRC settings. The system supports speech-based interaction, where user instructions are transcribed via a Speech-to-Text (STT) model and processed by a locally deployed, code-generating LLM. Guided by a structured prompt, the LLM produces custom responses for robot perception and manipulation. Inspection paths are generated relative to spatial axes or in specific directions and executed with real-time feedback through a Text-to-Speech (TTS) interface, allowing for a much closer interaction with the robot assistant. The system applies a hybrid control method, where the higher-level instructions are generated by LLM along with a perception pipeline, and the lower-level robot control is managed by ROS for safety and reliability. The system is evaluated across a range of experiments, including local LLM comparisons, prompt engineering effectiveness, and inspection performance in both simulated and real-world industrial use cases. Results demonstrate the system’s capability to handle complex inspection tasks on objects with varied sizes and geometries, confirming its practicality and robustness in realistic deployment settings. Code and videos are open-source available at: <ce:inter-ref xlink:href=\"https://github.com/CuriousLad1000/RoboSpection\" xlink:type=\"simple\"><ce:italic>https://github.com/CuriousLad1000/RoboSpection</ce:italic></ce:inter-ref><ce:italic>.</ce:italic>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"36 1","pages":""},"PeriodicalIF":11.4000,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Human–robot collaborative visual inspection with Large Language Models\",\"authors\":\"Osama Tasneem, Roel Pieters\",\"doi\":\"10.1016/j.rcim.2025.103154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human–Robot Collaboration (HRC) is gaining traction in advanced manufacturing as industries shift from isolated robotic systems to more collaborative environments. This transition is supported by advancements in automation and more recently, Generative AI. Large Language Models (LLMs) offer new possibilities for intuitive human–robot interaction through natural language. However, the use of natural language as a means remains very limited due to the ambiguous natural language, environmental noise, pronunciation variability, and multiple phrasing styles. Furthermore, cloud-based deployment of LLMs raises concerns about ergonomics and data privacy, especially for industries and countries governed by strict regulatory requirements. To address these challenges, we present a fully offline, closed-loop robotic assistant for visual inspection tasks in HRC settings. The system supports speech-based interaction, where user instructions are transcribed via a Speech-to-Text (STT) model and processed by a locally deployed, code-generating LLM. Guided by a structured prompt, the LLM produces custom responses for robot perception and manipulation. Inspection paths are generated relative to spatial axes or in specific directions and executed with real-time feedback through a Text-to-Speech (TTS) interface, allowing for a much closer interaction with the robot assistant. The system applies a hybrid control method, where the higher-level instructions are generated by LLM along with a perception pipeline, and the lower-level robot control is managed by ROS for safety and reliability. The system is evaluated across a range of experiments, including local LLM comparisons, prompt engineering effectiveness, and inspection performance in both simulated and real-world industrial use cases. Results demonstrate the system’s capability to handle complex inspection tasks on objects with varied sizes and geometries, confirming its practicality and robustness in realistic deployment settings. Code and videos are open-source available at: <ce:inter-ref xlink:href=\\\"https://github.com/CuriousLad1000/RoboSpection\\\" xlink:type=\\\"simple\\\"><ce:italic>https://github.com/CuriousLad1000/RoboSpection</ce:italic></ce:inter-ref><ce:italic>.</ce:italic>\",\"PeriodicalId\":21452,\"journal\":{\"name\":\"Robotics and Computer-integrated Manufacturing\",\"volume\":\"36 1\",\"pages\":\"\"},\"PeriodicalIF\":11.4000,\"publicationDate\":\"2025-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Computer-integrated Manufacturing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1016/j.rcim.2025.103154\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.rcim.2025.103154","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Human–robot collaborative visual inspection with Large Language Models
Human–Robot Collaboration (HRC) is gaining traction in advanced manufacturing as industries shift from isolated robotic systems to more collaborative environments. This transition is supported by advancements in automation and more recently, Generative AI. Large Language Models (LLMs) offer new possibilities for intuitive human–robot interaction through natural language. However, the use of natural language as a means remains very limited due to the ambiguous natural language, environmental noise, pronunciation variability, and multiple phrasing styles. Furthermore, cloud-based deployment of LLMs raises concerns about ergonomics and data privacy, especially for industries and countries governed by strict regulatory requirements. To address these challenges, we present a fully offline, closed-loop robotic assistant for visual inspection tasks in HRC settings. The system supports speech-based interaction, where user instructions are transcribed via a Speech-to-Text (STT) model and processed by a locally deployed, code-generating LLM. Guided by a structured prompt, the LLM produces custom responses for robot perception and manipulation. Inspection paths are generated relative to spatial axes or in specific directions and executed with real-time feedback through a Text-to-Speech (TTS) interface, allowing for a much closer interaction with the robot assistant. The system applies a hybrid control method, where the higher-level instructions are generated by LLM along with a perception pipeline, and the lower-level robot control is managed by ROS for safety and reliability. The system is evaluated across a range of experiments, including local LLM comparisons, prompt engineering effectiveness, and inspection performance in both simulated and real-world industrial use cases. Results demonstrate the system’s capability to handle complex inspection tasks on objects with varied sizes and geometries, confirming its practicality and robustness in realistic deployment settings. Code and videos are open-source available at: https://github.com/CuriousLad1000/RoboSpection.
期刊介绍:
The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.