Qianxi Li , Bao Pang , Yong Song , Hongze Fu , Qingyang Xu , Xianfeng Yuan , Xiaolong Xu , Chengjin Zhang
{"title":"大型语言模型辅助分层强化学习训练","authors":"Qianxi Li , Bao Pang , Yong Song , Hongze Fu , Qingyang Xu , Xianfeng Yuan , Xiaolong Xu , Chengjin Zhang","doi":"10.1016/j.ins.2025.122688","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional reinforcement learning (RL) cannot solve complex long-sequence decision tasks, especially when the environment rewards are sparse. Large language models (LLMs) can perform well in long-sequence decision tasks by leveraging their powerful inference capabilities. Although LLMs possess a large amount of general knowledge, LLM-based agents lack expertise in solving specific target problems. Considering that reinforcement learning models are smaller than LLMs and can be trained specifically to perform well on specific tasks, this paper proposes a hierarchical reinforcement learning framework assisted by a large language model, called LLMHRL. In this framework, the LLM acts as a teacher agent to guide the exploration of high-level policy in hierarchical reinforcement learning. The low-level policy consists of a library of selection-based policies. The agent executes specific actions based on the low-level policy chosen by the high-level policy. Furthermore, to reduce the action space of high-level policy, this paper decomposes it into skill options and target options. The two types of options are combined to obtain a high-level policy. This paper evaluates LLMHRL against baseline methods using both public and custom-built harder tasks across three environments: MiniGrid for key-door pairing, ManiSkill for tabletop sorting, and real-world scenarios. The results show that LLMHRL outperforms existing methods in success rate, convergence speed, and average return.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"723 ","pages":"Article 122688"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large language model assisted hierarchical reinforcement learning training\",\"authors\":\"Qianxi Li , Bao Pang , Yong Song , Hongze Fu , Qingyang Xu , Xianfeng Yuan , Xiaolong Xu , Chengjin Zhang\",\"doi\":\"10.1016/j.ins.2025.122688\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional reinforcement learning (RL) cannot solve complex long-sequence decision tasks, especially when the environment rewards are sparse. Large language models (LLMs) can perform well in long-sequence decision tasks by leveraging their powerful inference capabilities. Although LLMs possess a large amount of general knowledge, LLM-based agents lack expertise in solving specific target problems. Considering that reinforcement learning models are smaller than LLMs and can be trained specifically to perform well on specific tasks, this paper proposes a hierarchical reinforcement learning framework assisted by a large language model, called LLMHRL. In this framework, the LLM acts as a teacher agent to guide the exploration of high-level policy in hierarchical reinforcement learning. The low-level policy consists of a library of selection-based policies. The agent executes specific actions based on the low-level policy chosen by the high-level policy. Furthermore, to reduce the action space of high-level policy, this paper decomposes it into skill options and target options. The two types of options are combined to obtain a high-level policy. This paper evaluates LLMHRL against baseline methods using both public and custom-built harder tasks across three environments: MiniGrid for key-door pairing, ManiSkill for tabletop sorting, and real-world scenarios. The results show that LLMHRL outperforms existing methods in success rate, convergence speed, and average return.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"723 \",\"pages\":\"Article 122688\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525008217\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525008217","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Large language model assisted hierarchical reinforcement learning training
Traditional reinforcement learning (RL) cannot solve complex long-sequence decision tasks, especially when the environment rewards are sparse. Large language models (LLMs) can perform well in long-sequence decision tasks by leveraging their powerful inference capabilities. Although LLMs possess a large amount of general knowledge, LLM-based agents lack expertise in solving specific target problems. Considering that reinforcement learning models are smaller than LLMs and can be trained specifically to perform well on specific tasks, this paper proposes a hierarchical reinforcement learning framework assisted by a large language model, called LLMHRL. In this framework, the LLM acts as a teacher agent to guide the exploration of high-level policy in hierarchical reinforcement learning. The low-level policy consists of a library of selection-based policies. The agent executes specific actions based on the low-level policy chosen by the high-level policy. Furthermore, to reduce the action space of high-level policy, this paper decomposes it into skill options and target options. The two types of options are combined to obtain a high-level policy. This paper evaluates LLMHRL against baseline methods using both public and custom-built harder tasks across three environments: MiniGrid for key-door pairing, ManiSkill for tabletop sorting, and real-world scenarios. The results show that LLMHRL outperforms existing methods in success rate, convergence speed, and average return.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.