Enhancing Robot Learning Through Cognitive Reasoning Trajectory Optimization Under Unknown Dynamics

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-04-07 DOI:10.1109/LRA.2025.3558648

Qingwei Dong;Tingting Wu;Peng Zeng;Chuanzhi zang;Guangxi Wan;Shijie Cui

{"title":"Enhancing Robot Learning Through Cognitive Reasoning Trajectory Optimization Under Unknown Dynamics","authors":"Qingwei Dong;Tingting Wu;Peng Zeng;Chuanzhi zang;Guangxi Wan;Shijie Cui","doi":"10.1109/LRA.2025.3558648","DOIUrl":null,"url":null,"abstract":"In the domain of robot learning, equipping robots with the capability to swiftly acquire operational skills poses a significant challenge. Currently, reinforcement learning techniques are adept at addressing dynamic, unstructured problems involving rich contact scenarios. However, the convergence rate of these algorithms is often slow due to the high dimensionality of the robot state-action mapping space and the extensive initial policy search space. Meanwhile, advancements in large language models (LLMs) have endowed these models with a degree of logical reasoning ability, enabling them to take goal-oriented actions proactively during the initial phase of a robotic task. These models can implicitly generate features of states and uncover underlying patterns in trajectory generation. Yet, in complex manipulative tasks involving rich contact scenarios, LLMs still fall short. Thus, integrating the robust interactive capabilities of reinforcement learning with the strong logical reasoning of LLMs, and enhancing policy search with LLMs, could potentially accelerate the speed of policy searches significantly. In this letter, we introduce a Cognitive Reasoning Trajectory Optimization method. This approach utilizes Low-level Cognitive Control Tuning to enable LLMs with robust logical reasoning to make effective single-step decisions in Markov Decision Process (MDP) tasks. By fitting dynamic models with high-quality cognitive reasoning data and optimizing control strategies, this method constrains the policy search space and enhances the efficiency of trajectory optimization. Experimental results on various manipulative tasks using the Sawyer robot in the Mujoco simulator validate the effectiveness of the proposed algorithm.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 6","pages":"5401-5408"},"PeriodicalIF":4.6000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10955186/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

In the domain of robot learning, equipping robots with the capability to swiftly acquire operational skills poses a significant challenge. Currently, reinforcement learning techniques are adept at addressing dynamic, unstructured problems involving rich contact scenarios. However, the convergence rate of these algorithms is often slow due to the high dimensionality of the robot state-action mapping space and the extensive initial policy search space. Meanwhile, advancements in large language models (LLMs) have endowed these models with a degree of logical reasoning ability, enabling them to take goal-oriented actions proactively during the initial phase of a robotic task. These models can implicitly generate features of states and uncover underlying patterns in trajectory generation. Yet, in complex manipulative tasks involving rich contact scenarios, LLMs still fall short. Thus, integrating the robust interactive capabilities of reinforcement learning with the strong logical reasoning of LLMs, and enhancing policy search with LLMs, could potentially accelerate the speed of policy searches significantly. In this letter, we introduce a Cognitive Reasoning Trajectory Optimization method. This approach utilizes Low-level Cognitive Control Tuning to enable LLMs with robust logical reasoning to make effective single-step decisions in Markov Decision Process (MDP) tasks. By fitting dynamic models with high-quality cognitive reasoning data and optimizing control strategies, this method constrains the policy search space and enhances the efficiency of trajectory optimization. Experimental results on various manipulative tasks using the Sawyer robot in the Mujoco simulator validate the effectiveness of the proposed algorithm.

查看原文本刊更多论文

未知动力下认知推理轨迹优化增强机器人学习

在机器人学习领域，使机器人具备快速获得操作技能的能力是一个重大挑战。目前，强化学习技术擅长于解决涉及丰富接触场景的动态、非结构化问题。然而，由于机器人状态-动作映射空间的高维性和初始策略搜索空间的广泛性，这些算法的收敛速度往往很慢。同时，大型语言模型（llm）的进步赋予了这些模型一定程度的逻辑推理能力，使它们能够在机器人任务的初始阶段主动采取目标导向的行动。这些模型可以隐式地生成状态特征并揭示轨迹生成中的潜在模式。然而，在涉及丰富接触场景的复杂操作任务中，法学硕士仍然存在不足。因此，将强化学习的强大交互能力与llm的强大逻辑推理能力相结合，并用llm增强策略搜索，可能会显著加快策略搜索的速度。在这封信中，我们介绍了一种认知推理轨迹优化方法。这种方法利用低级认知控制调优，使llm具有强大的逻辑推理能力，能够在马尔可夫决策过程（MDP）任务中做出有效的单步决策。该方法利用高质量的认知推理数据拟合动态模型，优化控制策略，限制了策略搜索空间，提高了轨迹优化效率。在Mujoco模拟器上对Sawyer机器人的各种操作任务进行了实验，验证了该算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.