计划学习：一种在基于模型的计划中进行主动学习的新算法。

ArXiv Pub Date : 2025-08-14

Rowan Hodson, Bruce Bassett, Charel van Hoof, Benjamin Rosman, Mark Solms, Jonathan P Shock, Ryan Smith

{"title":"计划学习：一种在基于模型的计划中进行主动学习的新算法。","authors":"Rowan Hodson, Bruce Bassett, Charel van Hoof, Benjamin Rosman, Mark Solms, Jonathan P Shock, Ryan Smith","doi":"","DOIUrl":null,"url":null,"abstract":"We introduce Sophisticated Learning (SL), a planning-to-learn algorithm that embeds active parameter learning inside the Sophisticated Inference (SI) tree-search framework of Active Inference. Unlike SI -- which optimizes beliefs about hidden states -- SL also updates beliefs about model parameters within each simulated branch, enabling counterfactual reasoning about how future observations would improve subsequent planning. We compared SL with Bayes-adaptive Reinforcement Learning (BARL) agents as well as with its parent algorithm, SI. Using a biologically inspired seasonal foraging task in which resources shift probabilistically over a 10x10 grid, we designed experiments that forced agents to balance probabilistic reward harvesting against information gathering. In early trials, where rapid learning is vital, SL agents survive, on average, 8.2% longer than SI and 35% longer than Bayes-adaptive Reinforcement Learning. While both SL and SI showed equal convergence performance, SL reached this convergence 40% faster than SI. Additionally, SL showed robust out-performance of other algorithms in altered environment configurations. Our results show that incorporating active learning into multi-step planning materially improves decision making under radical uncertainty, and reinforces the broader utility of Active Inference for modeling biologically relevant behavior.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8d/70/nihpp-2308.08029v1.PMC10462173.pdf","citationCount":"0","resultStr":"{\"title\":\"Sophisticated Learning: A novel algorithm for active learning during model-based planning.\",\"authors\":\"Rowan Hodson, Bruce Bassett, Charel van Hoof, Benjamin Rosman, Mark Solms, Jonathan P Shock, Ryan Smith\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce Sophisticated Learning (SL), a planning-to-learn algorithm that embeds active parameter learning inside the Sophisticated Inference (SI) tree-search framework of Active Inference. Unlike SI -- which optimizes beliefs about hidden states -- SL also updates beliefs about model parameters within each simulated branch, enabling counterfactual reasoning about how future observations would improve subsequent planning. We compared SL with Bayes-adaptive Reinforcement Learning (BARL) agents as well as with its parent algorithm, SI. Using a biologically inspired seasonal foraging task in which resources shift probabilistically over a 10x10 grid, we designed experiments that forced agents to balance probabilistic reward harvesting against information gathering. In early trials, where rapid learning is vital, SL agents survive, on average, 8.2% longer than SI and 35% longer than Bayes-adaptive Reinforcement Learning. While both SL and SI showed equal convergence performance, SL reached this convergence 40% faster than SI. Additionally, SL showed robust out-performance of other algorithms in altered environment configurations. Our results show that incorporating active learning into multi-step planning materially improves decision making under radical uncertainty, and reinforces the broader utility of Active Inference for modeling biologically relevant behavior.\",\"PeriodicalId\":8425,\"journal\":{\"name\":\"ArXiv\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8d/70/nihpp-2308.08029v1.PMC10462173.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ArXiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

主动推理是最近开发的一种用于在不确定性下建模决策过程的框架。在过去几年中，经验和理论工作已经开始评估这种方法的优势和劣势，以及如何扩展和改进这种方法。最近的一个扩展是“复杂推理”（SI）算法，它通过递归决策树搜索提高了多步规划问题的性能。然而，迄今为止，在强化学习（RL）中，很少有人将SI与其他已建立的规划算法进行比较。此外，SI的开发重点是推理，而不是学习。因此，本文件有两个目的。首先，我们将SI方案的性能与为解决类似问题而设计的贝叶斯RL方案进行了比较。其次，我们提出并比较了SI的扩展——复杂学习（SL）——它在规划过程中更充分地融入了主动学习。SL对模型参数在每项政策下预期的未来观察结果下将如何变化保持信念。这允许一种反事实回顾推理的形式，在这种形式中，主体考虑在给定不同的未来观察的情况下，可以从当前或过去的观察中学到什么。为了实现这些目标，我们利用了一个新颖的、受生物学启发的环境，该环境需要在目标寻求和主动学习之间取得最佳平衡，旨在突出SL提供独特解决方案的问题结构。这种设置要求代理在存在信息获取竞争可供性的情况下，不断地在开放环境中搜索可用（但不断变化）的资源。我们的模拟表明，在这种情况下，SL优于所有其他算法，最显著的是贝叶斯自适应RL和置信上限（UCB）算法，它们旨在使用类似的原理（即，在给定不同可能的行动/观察的情况下，对信念更新的定向探索和反事实推理）来解决多步骤规划问题。这些结果为主动推理在解决这类生物学相关问题中的效用提供了额外的支持，并为检验关于人类认知的假设提供了更多的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Sophisticated Learning: A novel algorithm for active learning during model-based planning.

本刊更多论文

Sophisticated Learning: A novel algorithm for active learning during model-based planning.

We introduce Sophisticated Learning (SL), a planning-to-learn algorithm that embeds active parameter learning inside the Sophisticated Inference (SI) tree-search framework of Active Inference. Unlike SI -- which optimizes beliefs about hidden states -- SL also updates beliefs about model parameters within each simulated branch, enabling counterfactual reasoning about how future observations would improve subsequent planning. We compared SL with Bayes-adaptive Reinforcement Learning (BARL) agents as well as with its parent algorithm, SI. Using a biologically inspired seasonal foraging task in which resources shift probabilistically over a 10x10 grid, we designed experiments that forced agents to balance probabilistic reward harvesting against information gathering. In early trials, where rapid learning is vital, SL agents survive, on average, 8.2% longer than SI and 35% longer than Bayes-adaptive Reinforcement Learning. While both SL and SI showed equal convergence performance, SL reached this convergence 40% faster than SI. Additionally, SL showed robust out-performance of other algorithms in altered environment configurations. Our results show that incorporating active learning into multi-step planning materially improves decision making under radical uncertainty, and reinforces the broader utility of Active Inference for modeling biologically relevant behavior.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ArXiv

自引率

0.00%

发文量