适应性课程学习:通过动态任务排序优化强化学习

IF 1 Q4 OPTICS
M. Nesterova, A. Skrynnik, A. Panov
{"title":"适应性课程学习:通过动态任务排序优化强化学习","authors":"M. Nesterova,&nbsp;A. Skrynnik,&nbsp;A. Panov","doi":"10.3103/S1060992X2470070X","DOIUrl":null,"url":null,"abstract":"<p>Curriculum learning in reinforcement learning utilizes a strategy that sequences simpler tasks in order to optimize the learning process for more complex problems. Typically, existing methods are categorized into two distinct approaches: one that develops a teacher (a curriculum strategy) policy concurrently with a student (a learning agent) policy, and another that utilizes selective sampling based on the student policy’s experiences across a task distribution. The main issue with the first approach is the substantial computational demand, as it requires simultaneous training of both the low-level (student) and high-level (teacher) reinforcement learning policies. On the other hand, methods based on selective sampling presuppose that the agent is capable of maximizing reward accumulation across all tasks, which may lead to complications when the primary mission is to master a specific target task. This makes those models less effective in scenarios requiring focused learning. Our research addresses a particular scenario where a teacher needs to train a new student in a new short episode. This constraint compels the teacher to rapidly master the curriculum planning by identifying the most appropriate tasks. We evaluated our framework across several complex scenarios, including a partially observable grid-world navigation environment, and in procedurally generated open-world environment Crafter.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"33 3 supplement","pages":"S435 - S444"},"PeriodicalIF":1.0000,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Curriculum Learning: Optimizing Reinforcement Learning through Dynamic Task Sequencing\",\"authors\":\"M. Nesterova,&nbsp;A. Skrynnik,&nbsp;A. Panov\",\"doi\":\"10.3103/S1060992X2470070X\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Curriculum learning in reinforcement learning utilizes a strategy that sequences simpler tasks in order to optimize the learning process for more complex problems. Typically, existing methods are categorized into two distinct approaches: one that develops a teacher (a curriculum strategy) policy concurrently with a student (a learning agent) policy, and another that utilizes selective sampling based on the student policy’s experiences across a task distribution. The main issue with the first approach is the substantial computational demand, as it requires simultaneous training of both the low-level (student) and high-level (teacher) reinforcement learning policies. On the other hand, methods based on selective sampling presuppose that the agent is capable of maximizing reward accumulation across all tasks, which may lead to complications when the primary mission is to master a specific target task. This makes those models less effective in scenarios requiring focused learning. Our research addresses a particular scenario where a teacher needs to train a new student in a new short episode. This constraint compels the teacher to rapidly master the curriculum planning by identifying the most appropriate tasks. We evaluated our framework across several complex scenarios, including a partially observable grid-world navigation environment, and in procedurally generated open-world environment Crafter.</p>\",\"PeriodicalId\":721,\"journal\":{\"name\":\"Optical Memory and Neural Networks\",\"volume\":\"33 3 supplement\",\"pages\":\"S435 - S444\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-01-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Optical Memory and Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S1060992X2470070X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OPTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optical Memory and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S1060992X2470070X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPTICS","Score":null,"Total":0}
引用次数: 0

摘要

强化学习中的课程学习采用一种策略,将简单的任务排序,以优化更复杂问题的学习过程。通常,现有的方法分为两种不同的方法:一种方法是在制定教师(课程策略)策略的同时制定学生(学习代理)策略,另一种方法是根据学生策略在任务分布中的经验利用选择性抽样。第一种方法的主要问题是大量的计算需求,因为它需要同时训练低级(学生)和高级(教师)强化学习策略。另一方面,基于选择性抽样的方法假设智能体能够在所有任务中获得最大的奖励积累,当主要任务是掌握特定的目标任务时,这可能会导致复杂性。这使得这些模型在需要集中学习的场景中不那么有效。我们的研究解决了一个特殊的场景,一个老师需要在一个新的小插曲中训练一个新学生。这种约束迫使教师通过确定最合适的任务来快速掌握课程规划。我们在几个复杂的场景中评估了我们的框架,包括部分可观察的网格世界导航环境,以及程序生成的开放世界环境Crafter。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Adaptive Curriculum Learning: Optimizing Reinforcement Learning through Dynamic Task Sequencing

Adaptive Curriculum Learning: Optimizing Reinforcement Learning through Dynamic Task Sequencing

Curriculum learning in reinforcement learning utilizes a strategy that sequences simpler tasks in order to optimize the learning process for more complex problems. Typically, existing methods are categorized into two distinct approaches: one that develops a teacher (a curriculum strategy) policy concurrently with a student (a learning agent) policy, and another that utilizes selective sampling based on the student policy’s experiences across a task distribution. The main issue with the first approach is the substantial computational demand, as it requires simultaneous training of both the low-level (student) and high-level (teacher) reinforcement learning policies. On the other hand, methods based on selective sampling presuppose that the agent is capable of maximizing reward accumulation across all tasks, which may lead to complications when the primary mission is to master a specific target task. This makes those models less effective in scenarios requiring focused learning. Our research addresses a particular scenario where a teacher needs to train a new student in a new short episode. This constraint compels the teacher to rapidly master the curriculum planning by identifying the most appropriate tasks. We evaluated our framework across several complex scenarios, including a partially observable grid-world navigation environment, and in procedurally generated open-world environment Crafter.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.50
自引率
11.10%
发文量
25
期刊介绍: The journal covers a wide range of issues in information optics such as optical memory, mechanisms for optical data recording and processing, photosensitive materials, optical, optoelectronic and holographic nanostructures, and many other related topics. Papers on memory systems using holographic and biological structures and concepts of brain operation are also included. The journal pays particular attention to research in the field of neural net systems that may lead to a new generation of computional technologies by endowing them with intelligence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信