大型语言模型辅助分层强化学习训练

IF 6.8 1区 计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS
Qianxi Li , Bao Pang , Yong Song , Hongze Fu , Qingyang Xu , Xianfeng Yuan , Xiaolong Xu , Chengjin Zhang
{"title":"大型语言模型辅助分层强化学习训练","authors":"Qianxi Li ,&nbsp;Bao Pang ,&nbsp;Yong Song ,&nbsp;Hongze Fu ,&nbsp;Qingyang Xu ,&nbsp;Xianfeng Yuan ,&nbsp;Xiaolong Xu ,&nbsp;Chengjin Zhang","doi":"10.1016/j.ins.2025.122688","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional reinforcement learning (RL) cannot solve complex long-sequence decision tasks, especially when the environment rewards are sparse. Large language models (LLMs) can perform well in long-sequence decision tasks by leveraging their powerful inference capabilities. Although LLMs possess a large amount of general knowledge, LLM-based agents lack expertise in solving specific target problems. Considering that reinforcement learning models are smaller than LLMs and can be trained specifically to perform well on specific tasks, this paper proposes a hierarchical reinforcement learning framework assisted by a large language model, called LLMHRL. In this framework, the LLM acts as a teacher agent to guide the exploration of high-level policy in hierarchical reinforcement learning. The low-level policy consists of a library of selection-based policies. The agent executes specific actions based on the low-level policy chosen by the high-level policy. Furthermore, to reduce the action space of high-level policy, this paper decomposes it into skill options and target options. The two types of options are combined to obtain a high-level policy. This paper evaluates LLMHRL against baseline methods using both public and custom-built harder tasks across three environments: MiniGrid for key-door pairing, ManiSkill for tabletop sorting, and real-world scenarios. The results show that LLMHRL outperforms existing methods in success rate, convergence speed, and average return.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"723 ","pages":"Article 122688"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large language model assisted hierarchical reinforcement learning training\",\"authors\":\"Qianxi Li ,&nbsp;Bao Pang ,&nbsp;Yong Song ,&nbsp;Hongze Fu ,&nbsp;Qingyang Xu ,&nbsp;Xianfeng Yuan ,&nbsp;Xiaolong Xu ,&nbsp;Chengjin Zhang\",\"doi\":\"10.1016/j.ins.2025.122688\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional reinforcement learning (RL) cannot solve complex long-sequence decision tasks, especially when the environment rewards are sparse. Large language models (LLMs) can perform well in long-sequence decision tasks by leveraging their powerful inference capabilities. Although LLMs possess a large amount of general knowledge, LLM-based agents lack expertise in solving specific target problems. Considering that reinforcement learning models are smaller than LLMs and can be trained specifically to perform well on specific tasks, this paper proposes a hierarchical reinforcement learning framework assisted by a large language model, called LLMHRL. In this framework, the LLM acts as a teacher agent to guide the exploration of high-level policy in hierarchical reinforcement learning. The low-level policy consists of a library of selection-based policies. The agent executes specific actions based on the low-level policy chosen by the high-level policy. Furthermore, to reduce the action space of high-level policy, this paper decomposes it into skill options and target options. The two types of options are combined to obtain a high-level policy. This paper evaluates LLMHRL against baseline methods using both public and custom-built harder tasks across three environments: MiniGrid for key-door pairing, ManiSkill for tabletop sorting, and real-world scenarios. The results show that LLMHRL outperforms existing methods in success rate, convergence speed, and average return.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"723 \",\"pages\":\"Article 122688\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525008217\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525008217","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

传统的强化学习(RL)不能解决复杂的长序列决策任务,特别是当环境奖励稀疏时。大型语言模型利用其强大的推理能力,可以在长序列决策任务中表现良好。虽然法学硕士拥有大量的一般知识,但基于法学硕士的代理缺乏解决特定目标问题的专业知识。考虑到强化学习模型比llm小,并且可以专门训练以在特定任务上表现良好,本文提出了一个由大型语言模型辅助的分层强化学习框架,称为LLMHRL。在这个框架中,LLM作为教师代理来指导分层强化学习中高层次策略的探索。低级策略由一个基于选择的策略库组成。代理根据高级策略选择的低级策略执行特定的操作。此外,为了缩小高层政策的作用空间,本文将其分解为技能选项和目标选项。将这两种类型的选项组合起来以获得高级策略。本文使用公共和自定义构建的更困难的任务,在三个环境中根据基线方法评估LLMHRL:用于钥匙门配对的MiniGrid,用于桌面排序的ManiSkill,以及真实场景。结果表明,LLMHRL在成功率、收敛速度和平均收益方面优于现有方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Large language model assisted hierarchical reinforcement learning training
Traditional reinforcement learning (RL) cannot solve complex long-sequence decision tasks, especially when the environment rewards are sparse. Large language models (LLMs) can perform well in long-sequence decision tasks by leveraging their powerful inference capabilities. Although LLMs possess a large amount of general knowledge, LLM-based agents lack expertise in solving specific target problems. Considering that reinforcement learning models are smaller than LLMs and can be trained specifically to perform well on specific tasks, this paper proposes a hierarchical reinforcement learning framework assisted by a large language model, called LLMHRL. In this framework, the LLM acts as a teacher agent to guide the exploration of high-level policy in hierarchical reinforcement learning. The low-level policy consists of a library of selection-based policies. The agent executes specific actions based on the low-level policy chosen by the high-level policy. Furthermore, to reduce the action space of high-level policy, this paper decomposes it into skill options and target options. The two types of options are combined to obtain a high-level policy. This paper evaluates LLMHRL against baseline methods using both public and custom-built harder tasks across three environments: MiniGrid for key-door pairing, ManiSkill for tabletop sorting, and real-world scenarios. The results show that LLMHRL outperforms existing methods in success rate, convergence speed, and average return.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Sciences
Information Sciences 工程技术-计算机:信息系统
CiteScore
14.00
自引率
17.30%
发文量
1322
审稿时长
10.4 months
期刊介绍: Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信