Robotics and Computer-integrated Manufacturing最新文献

Perception-decision-execution coordination mechanism driven dynamic autonomous collaboration method for human-like collaborative robot based on multimodal large language model 基于多模态大语言模型的仿人协作机器人感知-决策-执行协调机制驱动的动态自主协作方法

IF 10.4 1区计算机科学

Robotics and Computer-integrated Manufacturing Pub Date : 2025-10-11 DOI: 10.1016/j.rcim.2025.103167

Jianpeng Chen, Sihan Huang, Xiaowen Wang, Pengfei Wang, Jiahao Zhu, Zhe Xu, Guoxin Wang, Yan Yan, Lihui Wang

{"title":"Perception-decision-execution coordination mechanism driven dynamic autonomous collaboration method for human-like collaborative robot based on multimodal large language model","authors":"Jianpeng Chen, Sihan Huang, Xiaowen Wang, Pengfei Wang, Jiahao Zhu, Zhe Xu, Guoxin Wang, Yan Yan, Lihui Wang","doi":"10.1016/j.rcim.2025.103167","DOIUrl":"https://doi.org/10.1016/j.rcim.2025.103167","url":null,"abstract":"With the advent of Industry 5.0, human-centric smart manufacturing is becoming a new paradigm for industrial transformation. Human-robot collaboration (HRC) is the hot topic of human-centric smart manufacturing. The emergence of large language model (LLM) provides significant opportunity for collaborative robot to promote the autonomous collaboration ability, which brings HRC into new era driven by embodied intelligence and more powerful robot. Therefore, a dynamic autonomous collaboration method inspired from looking-thinking-doing chain of human operators is proposed for human-like collaborative robot (HLCobot) in human-centric smart manufacturing based on multimodal large language model (MLLM), where perception-decision-execution coordination mechanism is constructed to appropriately distribute the abilities of MLLM in the dynamic operation chain of HRC. Firstly, a brain-inspired architecture with the integration of perception hub, decision hub, and execution hub is designed for dynamic autonomous collaboration. Secondly, the abilities of perception, decision, execution of HLCobot are realized by integrating MLLM, where the HLCobot can actively recognize the dynamic changes of HRC scenario by mimicking human operator and execute the correct motions to complete the necessary collaborative task autonomously. Additionally, a coordination mechanism among the agents of perception, decision, and execution is put forward to proceed the collaborative task smoothly. Finally, a case study of engine assembly is provided to demonstrate the effectiveness of the proposed method.","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"115 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145262007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A hierarchical spatial–aware algorithm with efficient reinforcement learning for human–robot task planning and allocation in production 面向生产中人机任务规划与分配的分层空间感知高效强化学习算法

IF 10.4 1区计算机科学

Robotics and Computer-integrated Manufacturing Pub Date : 2025-10-10 DOI: 10.1016/j.rcim.2025.103159

Jintao Xue, Xiao Li, Nianmin Zhang

{"title":"A hierarchical spatial–aware algorithm with efficient reinforcement learning for human–robot task planning and allocation in production","authors":"Jintao Xue, Xiao Li, Nianmin Zhang","doi":"10.1016/j.rcim.2025.103159","DOIUrl":"https://doi.org/10.1016/j.rcim.2025.103159","url":null,"abstract":"In advanced manufacturing systems, humans and robots collaborate to conduct the production process. Effective task planning and allocation (TPA) is crucial for achieving high production efficiency, yet it remains challenging in complex and dynamic manufacturing environments. The dynamic nature of humans and robots, particularly the need to consider spatial information (e.g., humans’ real-time position and the distance they need to move to complete a task), substantially complicates TPA. To address the above challenges, we decompose production tasks into manageable subtasks. We then implement a real-time hierarchical human–robot TPA algorithm, including a high-level agent for task planning and a low-level agent for task allocation. For the high-level agent, we propose an efficient buffer-based deep Q-learning method (EBQ), which reduces training time and enhances performance in production problems with long-term and sparse reward challenges. For the low-level agent, a path planning-based spatially aware method (SAP) is designed to allocate tasks to the appropriate human–robot resources, thereby achieving the corresponding sequential subtasks. We conducted experiments on a complex real-time production process in a 3D simulator. The results demonstrate that our proposed EBQ&SAP method effectively addresses human–robot TPA problems in complex and dynamic production processes.","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"86 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145261988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A reinforcement learning-based metaheuristic approach to address the dynamic scheduling problem in cloud manufacturing with task cancellation 一种基于强化学习的元启发式方法解决带有任务取消的云制造动态调度问题

IF 10.4 1区计算机科学

Robotics and Computer-integrated Manufacturing Pub Date : 2025-10-09 DOI: 10.1016/j.rcim.2025.103160

Atefeh Rajabi-Kafshgar, Mostafa Hajiaghaei-Keshteli, Mohammad Reza Mohammad Aliha

{"title":"A reinforcement learning-based metaheuristic approach to address the dynamic scheduling problem in cloud manufacturing with task cancellation","authors":"Atefeh Rajabi-Kafshgar, Mostafa Hajiaghaei-Keshteli, Mohammad Reza Mohammad Aliha","doi":"10.1016/j.rcim.2025.103160","DOIUrl":"https://doi.org/10.1016/j.rcim.2025.103160","url":null,"abstract":"Recent developments in cloud manufacturing (CMg) have highlighted the need for efficient task scheduling and resource allocation in distributed and dynamic environments. To the best of our knowledge, existing studies have not considered dynamic events such as task cancellation, which can lead to resource inefficiencies and disrupt the initial schedule. To address this gap, this paper introduces a novel dynamic task scheduling and service allocation (DTSSA) problem in CMg that considers task cancellation. The proposed model considers logistics time and different arrival times, which directly impact the tasks’ completion times. Furthermore, a reinforcement learning-based genetic algorithm is developed to tackle the NP-hardness of the model and solve medium- and large-scale problems in a reasonable time. The algorithm dynamically selects search operators using the Q-learning algorithm and applies a <ce:italic>ε</ce:italic>-greedy approach to improve search capabilities. In this regard, first, the metaheuristic algorithms’ parameters are tuned by the Taguchi method. The proposed algorithms were evaluated using 30 benchmark instances from the literature, as well as example cases inspired by existing studies. Next, the mathematical model is evaluated by implementing small-scale examples using GAMS software. Then, the algorithms are compared with not only some well-known metaheuristic algorithms but also recently developed metaheuristic algorithms using statistical tests and several test problems of different sizes. Additionally, results show that the rescheduling problem provides up to 8.7% better solutions on average than the initial schedule. Lastly, the model's sensitivity analysis reveals that the longer the processing time and logistic time, the longer the maximum completion time for scheduling and rescheduling.","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"105 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145261992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and control of a parallel electromagnetic variable stiffness manipulator for robotic compliant grinding 并联电磁变刚度机器人柔顺磨削机械手的设计与控制

IF 10.4 1区计算机科学

Robotics and Computer-integrated Manufacturing Pub Date : 2025-10-09 DOI: 10.1016/j.rcim.2025.103158

Xu Tang, Jixiang Yang, Han Ding

{"title":"Design and control of a parallel electromagnetic variable stiffness manipulator for robotic compliant grinding","authors":"Xu Tang, Jixiang Yang, Han Ding","doi":"10.1016/j.rcim.2025.103158","DOIUrl":"https://doi.org/10.1016/j.rcim.2025.103158","url":null,"abstract":"Environmental adaptability is a key challenge in robotic operation, while the limited compliance of parallel manipulators hinders their application in variable-stiffness environments. This paper proposes a novel three-degree-of-freedom parallel electromagnetic variable stiffness manipulator (PEVSM) that actively adapts to the environment through self-stiffness modulation. PEVSM integrates an unconstrained variable stiffness limb driven by an electromagnetic spring and three compliant limbs actuated by Lorentz motors, enabling active and continuous stiffness modulation. Based on the established kinematic and stiffness models, a hybrid force-position-stiffness control framework is developed, integrating enhanced fractional-order adaptive impedance control and a stiffness controller based on deep deterministic policy gradient with multi-source feedback to achieve precise and compliant force regulation. For applying robotic grinding to low-stiffness workpieces, a force–deformation model and a force compensation strategy are introduced to mitigate deformation effects and improve material removal accuracy. The robotic grinding platform with PEVSM is constructed, demonstrating its advantages on improve force control and material removal accuracy in compliant grinding.","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"5 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145261990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards zero-shot robot tool manipulation in industrial context: A modular VLM framework enhanced by multimodal affordance representation 工业环境中的零射击机器人工具操作：一个由多模态功能表示增强的模块化VLM框架

IF 10.4 1区计算机科学

Robotics and Computer-integrated Manufacturing Pub Date : 2025-10-07 DOI: 10.1016/j.rcim.2025.103161

Qi Zhou, Yuwei Gu, Jiawen Li, Bohan Feng, Boyan Li, Youyi Bi

{"title":"Towards zero-shot robot tool manipulation in industrial context: A modular VLM framework enhanced by multimodal affordance representation","authors":"Qi Zhou, Yuwei Gu, Jiawen Li, Bohan Feng, Boyan Li, Youyi Bi","doi":"10.1016/j.rcim.2025.103161","DOIUrl":"https://doi.org/10.1016/j.rcim.2025.103161","url":null,"abstract":"Robot tool manipulation in industrial context requires precise spatial localization, stable force control, and versatile adaptability across diverse tools and tasks. Traditional robot manipulation methods usually struggle to generalize to unseen scenarios and maintain reliable, precise interactions under complex physical constraints. Recent Vision Language Model (VLM)-based approaches demonstrate better generalization ability, but they often lack fine-grained modeling of spatial and force constraints that are critical for real-world industrial applications. To address these challenges, we propose a novel framework for zero-shot robot tool manipulation in industrial environments, named as <ce:italic>ToolManip</ce:italic>. This framework adopts a modular and multi-agent VLM architecture. It decomposes the manipulation process into four specialized modules—task understanding and planning, affordance reasoning, primitive reasoning, and execution monitoring—each handled by a dedicated VLM agent. To enhance the manipulation accuracy, we develop a multimodal affordance representation method that models spatial and force constraints separately. Spatial constraints are encoded via hierarchical region extraction and structured interaction fields to define keypoints and interaction directions, while force constraints are represented through force control primitive reasoning to enable precise and compliant motion and force planning. Additionally, an integrated execution-monitoring pipeline improves the system reliability by tracking the status of each task step and performing stepwise corrections. Experimental results demonstrate that ToolManip achieves robust, generalizable, and high-accuracy performance in various constraint-rich industrial tool manipulation tasks. Our work contributes to the development of advanced robotic manipulation methods for industry and smart manufacturing environments empowered by generative artificial intelligence.","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"115 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145261989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human–robot collaborative visual inspection with Large Language Models 基于大语言模型的人机协同视觉检测

IF 10.4 1区计算机科学

Robotics and Computer-integrated Manufacturing Pub Date : 2025-10-07 DOI: 10.1016/j.rcim.2025.103154

Osama Tasneem, Roel Pieters

{"title":"Human–robot collaborative visual inspection with Large Language Models","authors":"Osama Tasneem, Roel Pieters","doi":"10.1016/j.rcim.2025.103154","DOIUrl":"https://doi.org/10.1016/j.rcim.2025.103154","url":null,"abstract":"Human–Robot Collaboration (HRC) is gaining traction in advanced manufacturing as industries shift from isolated robotic systems to more collaborative environments. This transition is supported by advancements in automation and more recently, Generative AI. Large Language Models (LLMs) offer new possibilities for intuitive human–robot interaction through natural language. However, the use of natural language as a means remains very limited due to the ambiguous natural language, environmental noise, pronunciation variability, and multiple phrasing styles. Furthermore, cloud-based deployment of LLMs raises concerns about ergonomics and data privacy, especially for industries and countries governed by strict regulatory requirements. To address these challenges, we present a fully offline, closed-loop robotic assistant for visual inspection tasks in HRC settings. The system supports speech-based interaction, where user instructions are transcribed via a Speech-to-Text (STT) model and processed by a locally deployed, code-generating LLM. Guided by a structured prompt, the LLM produces custom responses for robot perception and manipulation. Inspection paths are generated relative to spatial axes or in specific directions and executed with real-time feedback through a Text-to-Speech (TTS) interface, allowing for a much closer interaction with the robot assistant. The system applies a hybrid control method, where the higher-level instructions are generated by LLM along with a perception pipeline, and the lower-level robot control is managed by ROS for safety and reliability. The system is evaluated across a range of experiments, including local LLM comparisons, prompt engineering effectiveness, and inspection performance in both simulated and real-world industrial use cases. Results demonstrate the system’s capability to handle complex inspection tasks on objects with varied sizes and geometries, confirming its practicality and robustness in realistic deployment settings. Code and videos are open-source available at: <ce:inter-ref xlink:href=\"https://github.com/CuriousLad1000/RoboSpection\" xlink:type=\"simple\"><ce:italic>https://github.com/CuriousLad1000/RoboSpection</ce:italic></ce:inter-ref><ce:italic>.</ce:italic>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"36 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145261996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient scheduling for fixed-type multi-robot collaborative problem in flexible job shop 柔性作业车间固定型多机器人协同问题的高效调度

IF 10.4 1区计算机科学

Robotics and Computer-integrated Manufacturing Pub Date : 2025-10-07 DOI: 10.1016/j.rcim.2025.103157

Jin Huang, Xinyu Li, Qihao Liu, Liang Gao

{"title":"Efficient scheduling for fixed-type multi-robot collaborative problem in flexible job shop","authors":"Jin Huang, Xinyu Li, Qihao Liu, Liang Gao","doi":"10.1016/j.rcim.2025.103157","DOIUrl":"https://doi.org/10.1016/j.rcim.2025.103157","url":null,"abstract":"With the rapid development of intelligent manufacturing, multi-robot collaborative systems are becoming increasingly integrated into production environments. In the flexible workshop environment for aircraft skin processing, achieving efficient manufacturing heavily depends on the effective allocation and scheduling of multi-robot tasks, where adjusting processing times is accomplished by altering the number of robots involved. Therefore, this paper focuses on addressing the fixed-type multi-robot collaborative flexible job shop scheduling problem (MCFJSP), aiming to optimize both the number of deployed robots and the maximum completion time. To tackle this multi-objective MCFJSP, a mixed integer linear programming (MILP) model and a constraint programming (CP) model based on the <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:mi>ϵ</mml:mi></mml:math>-constrained method are proposed to obtain the optimal Pareto fronts for small-scale cases and medium to large-scale cases, respectively. To efficiently obtain approximate Pareto fronts, a multi-objective hybrid genetic tabu search algorithm (MO-HA) is developed, which incorporates a neighborhood search method based on the critical path. Furthermore, this study introduces an event-driven rescheduling framework that utilizes the efficient MO-HA to dynamically adapt schedules in response to real-time disruptions such as new order insertions and unexpected machine breakdowns. Finally, the proposed methods are tested on 15 benchmark cases, two real-world cases, and 32 dynamic cases. Comparative experiments show that the MILP model based on the <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:mi>ϵ</mml:mi></mml:math>-constrained method is effective for small-scale cases, while a CP model based on the <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:mi>ϵ</mml:mi></mml:math>-constrained method is introduced for medium to large-scale cases. Additionally, the proposed MO-HA provides high-quality solutions efficiently for both static and dynamic scenarios. This study offers valuable insights for practitioners, providing actionable strategies to enhance workshop productivity while reducing the number of robots deployed.","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"36 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145261991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving transparency in physical human–robot interaction via unknown dynamics compensation 通过未知动力学补偿提高物理人机交互的透明度

IF 11.4 1区计算机科学

Robotics and Computer-integrated Manufacturing Pub Date : 2025-10-04 DOI: 10.1016/j.rcim.2025.103153

Seung Ho Lee, Dong Jun Oh, Hyungpil Moon, Hyouk Ryeol Choi, Ja Choon Koo

{"title":"Improving transparency in physical human–robot interaction via unknown dynamics compensation","authors":"Seung Ho Lee, Dong Jun Oh, Hyungpil Moon, Hyouk Ryeol Choi, Ja Choon Koo","doi":"10.1016/j.rcim.2025.103153","DOIUrl":"10.1016/j.rcim.2025.103153","url":null,"abstract":"<div><div>Physical human–robot interaction (pHRi) often involves tasks with unknown load dynamics, such as transport and assembly, which can reduce transparency, efficiency, and operator comfort. This study presents a compensator for real-time adjustment of unknown load dynamics, enhancing transparency through admittance control. Transparency was quantified as energy per distance generated by interaction forces, with the compensator’s design addressing unaccounted physical dynamics affecting both operator and robot, modeled as an equivalent physical system. Utilizing interaction data from the end-effector coordinate system and a time delay control approach, the compensator was mathematically formulated to mitigate dynamic impacts, with stability verified via the Lyapunov criterion. Simulations and empirical tests demonstrated improved transparency over existing controllers across varying motion speeds and load dynamics. This study highlights the role of dynamic compensation in advancing pHRi transparency and proposes future work to refine low-level dynamic adjustments.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103153"},"PeriodicalIF":11.4,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Physics-based simulation framework for Digital Twin applications: Machine parameter tuning for handling of lumber in the wood industry 数字孪生应用的基于物理的模拟框架：木材工业中处理木材的机器参数调整

IF 11.4 1区计算机科学

Robotics and Computer-integrated Manufacturing Pub Date : 2025-10-03 DOI: 10.1016/j.rcim.2025.103144

Francesco Berardinucci, Marco Rossoni, Giorgio Colombo, Marcello Urgo

{"title":"Physics-based simulation framework for Digital Twin applications: Machine parameter tuning for handling of lumber in the wood industry","authors":"Francesco Berardinucci, Marco Rossoni, Giorgio Colombo, Marcello Urgo","doi":"10.1016/j.rcim.2025.103144","DOIUrl":"10.1016/j.rcim.2025.103144","url":null,"abstract":"<div><div>Tuning the operational parameters of complex handling machines involves a complex interplay of variables impacting the performance and reliability of the equipment and the processes being executed. By integrating advanced simulation tools in DTs architectures, manufacturers can predict and analyse the performance of machines under various settings and scenarios. This paper proposes a physics-based simulation framework designed for offline optimisation of machine parameters and for integration in Digital Twin applications to explore the configuration space of machine parameters for their selection and fine-tuning. The framework enables virtual exploration of the parameter space to identify optimal parameter settings in terms of productivity and stability for both design-phase analysis and machine setup optimisation. While developed as a simulation component suitable for integration within Digital Twin architectures, the current implementation operates independently of real-time data integration. A case study from the wood industry demonstrates the application and validation of the approach under realistic operational scenarios, showing the framework’s potential for deployment in Digital Twin systems.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103144"},"PeriodicalIF":11.4,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VTLG: A vision-tactile-language grasp generation method oriented towards task 面向任务的视觉-触觉-语言掌握生成方法

IF 11.4 1区计算机科学

Robotics and Computer-integrated Manufacturing Pub Date : 2025-10-01 DOI: 10.1016/j.rcim.2025.103152

Tong Li, Chengshun Yu, Yuhang Yan, Di Song, Yuxin Shuai, Yifan Wang, Gang Chen

{"title":"VTLG: A vision-tactile-language grasp generation method oriented towards task","authors":"Tong Li, Chengshun Yu, Yuhang Yan, Di Song, Yuxin Shuai, Yifan Wang, Gang Chen","doi":"10.1016/j.rcim.2025.103152","DOIUrl":"10.1016/j.rcim.2025.103152","url":null,"abstract":"<div><div>Preceded task-oriented grasp is indispensable for achieving reliable robotic manipulation. Existing task-oriented grasp generation methods typically rely on visual information of the target, deployed on simple parallel-jaw grippers, which often struggle with visual degradation and inadequate grasp reliability and flexibility. In this paper, we propose a task-oriented grasp generation system that integrates external visual and tactile perception on the guidance of textual description. Multimodal encoding consists of two stages: the visual-tactile feature fusion encoding for robot grasp and object spatial perception, and the textual normalization and encoding, followed by spatial perception-semantic feature fusion. We conceive to introduce tactile perception in the pre-contact phase. A visual-tactile fusion method is proposed to combine single-view point cloud with tactile array of pre-contact, partitioning the object surface into multiple contact patches. The vision transformer architecture is employed to gain a spatial representation of the object surface, encoding globally spatial features and implicitly assisting in evaluating the feasibility of grasps across regions by shape learning. To improve the inference effectiveness of the large language model under varying task context, we introduce specifications for textual standardization and migratory comprehension for unknown concepts. The language encoder and principal component analysis are used to encode the given standardized text that follows the above textual generation paradigm. A spatial-semantic feature fusion method is then proposed based on window shift and cross-attention to realize the alignment between task context and object spatial features, ulteriorly preference-based attention on graspable regions. Finally, we present a grasp parameter prediction module based on diffusion model specialized for high-dimensional conditions and generalized hand proprioceptive space, which generates grasp by predicting noise. Experimental results demonstrate that the proposed method outperforms baseline methods requiring the complete object shape, since the mean average precision metrics reaches 72.13%, with an improvement of 1.65%. Each module exhibits performance improvement over conventional methods. Ablation study indicates that the introduction of tactile and text modality improves the metrics by over 3% and 14%. The single-shot success rate for the predicted grasps exceeds 65% on real-world experiments, underscoring the reliability of the proposed system.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103152"},"PeriodicalIF":11.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0