Qingwei Dong;Tingting Wu;Peng Zeng;Chuanzhi zang;Guangxi Wan;Shijie Cui
{"title":"Enhancing Robot Learning Through Cognitive Reasoning Trajectory Optimization Under Unknown Dynamics","authors":"Qingwei Dong;Tingting Wu;Peng Zeng;Chuanzhi zang;Guangxi Wan;Shijie Cui","doi":"10.1109/LRA.2025.3558648","DOIUrl":null,"url":null,"abstract":"In the domain of robot learning, equipping robots with the capability to swiftly acquire operational skills poses a significant challenge. Currently, reinforcement learning techniques are adept at addressing dynamic, unstructured problems involving rich contact scenarios. However, the convergence rate of these algorithms is often slow due to the high dimensionality of the robot state-action mapping space and the extensive initial policy search space. Meanwhile, advancements in large language models (LLMs) have endowed these models with a degree of logical reasoning ability, enabling them to take goal-oriented actions proactively during the initial phase of a robotic task. These models can implicitly generate features of states and uncover underlying patterns in trajectory generation. Yet, in complex manipulative tasks involving rich contact scenarios, LLMs still fall short. Thus, integrating the robust interactive capabilities of reinforcement learning with the strong logical reasoning of LLMs, and enhancing policy search with LLMs, could potentially accelerate the speed of policy searches significantly. In this letter, we introduce a Cognitive Reasoning Trajectory Optimization method. This approach utilizes Low-level Cognitive Control Tuning to enable LLMs with robust logical reasoning to make effective single-step decisions in Markov Decision Process (MDP) tasks. By fitting dynamic models with high-quality cognitive reasoning data and optimizing control strategies, this method constrains the policy search space and enhances the efficiency of trajectory optimization. Experimental results on various manipulative tasks using the Sawyer robot in the Mujoco simulator validate the effectiveness of the proposed algorithm.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 6","pages":"5401-5408"},"PeriodicalIF":4.6000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10955186/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
In the domain of robot learning, equipping robots with the capability to swiftly acquire operational skills poses a significant challenge. Currently, reinforcement learning techniques are adept at addressing dynamic, unstructured problems involving rich contact scenarios. However, the convergence rate of these algorithms is often slow due to the high dimensionality of the robot state-action mapping space and the extensive initial policy search space. Meanwhile, advancements in large language models (LLMs) have endowed these models with a degree of logical reasoning ability, enabling them to take goal-oriented actions proactively during the initial phase of a robotic task. These models can implicitly generate features of states and uncover underlying patterns in trajectory generation. Yet, in complex manipulative tasks involving rich contact scenarios, LLMs still fall short. Thus, integrating the robust interactive capabilities of reinforcement learning with the strong logical reasoning of LLMs, and enhancing policy search with LLMs, could potentially accelerate the speed of policy searches significantly. In this letter, we introduce a Cognitive Reasoning Trajectory Optimization method. This approach utilizes Low-level Cognitive Control Tuning to enable LLMs with robust logical reasoning to make effective single-step decisions in Markov Decision Process (MDP) tasks. By fitting dynamic models with high-quality cognitive reasoning data and optimizing control strategies, this method constrains the policy search space and enhances the efficiency of trajectory optimization. Experimental results on various manipulative tasks using the Sawyer robot in the Mujoco simulator validate the effectiveness of the proposed algorithm.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.