用自由能最小化来提升MCTS。

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2025-09-19 DOI:10.1162/neco.a.31

Mawaba Pascal Dao, Adrian M Peter

{"title":"用自由能最小化来提升MCTS。","authors":"Mawaba Pascal Dao, Adrian M Peter","doi":"10.1162/neco.a.31","DOIUrl":null,"url":null,"abstract":"Active inference, grounded in the free energy principle, provides a powerful lens for understanding how agents balance exploration and goal-directed behavior in uncertain environments. Here, we propose a new planning framework that integrates Monte Carlo tree search (MCTS) with active inference objectives to systematically reduce epistemic uncertainty while pursuing extrinsic rewards. Our key insight is that MCTS, already renowned for its search efficiency, can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain. Concretely, the cross-entropy method (CEM) is used to optimize action proposals at the root node, while tree expansions leverage reward modeling alongside intrinsic exploration bonuses. This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability. Empirically, we benchmark our planner on a diverse set of continuous control tasks, where it demonstrates performance gains over both stand-alone CEM and MCTS with random rollouts.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-30"},"PeriodicalIF":2.1000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Boosting MCTS With Free Energy Minimization.\",\"authors\":\"Mawaba Pascal Dao, Adrian M Peter\",\"doi\":\"10.1162/neco.a.31\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Active inference, grounded in the free energy principle, provides a powerful lens for understanding how agents balance exploration and goal-directed behavior in uncertain environments. Here, we propose a new planning framework that integrates Monte Carlo tree search (MCTS) with active inference objectives to systematically reduce epistemic uncertainty while pursuing extrinsic rewards. Our key insight is that MCTS, already renowned for its search efficiency, can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain. Concretely, the cross-entropy method (CEM) is used to optimize action proposals at the root node, while tree expansions leverage reward modeling alongside intrinsic exploration bonuses. This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability. Empirically, we benchmark our planner on a diverse set of continuous control tasks, where it demonstrates performance gains over both stand-alone CEM and MCTS with random rollouts.\",\"PeriodicalId\":54731,\"journal\":{\"name\":\"Neural Computation\",\"volume\":\" \",\"pages\":\"1-30\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1162/neco.a.31\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/neco.a.31","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基于自由能原理的主动推理为理解智能体在不确定环境中如何平衡探索和目标导向行为提供了一个强有力的视角。在这里，我们提出了一个新的规划框架，将蒙特卡罗树搜索（MCTS）与主动推理目标相结合，在追求外在奖励的同时系统地减少认知不确定性。我们的关键见解是，MCTS已经以其搜索效率而闻名，可以通过混合预期奖励和信息增益，自然地扩展到包含自由能量最小化。具体而言，交叉熵方法（CEM）用于优化根节点的行动建议，而树扩展利用奖励建模和内在探索奖励。这种协同作用允许我们的计划人员在整个计划过程中保持对价值和不确定性的一致估计，而不牺牲计算的可追溯性。根据经验，我们在一系列连续控制任务上对我们的计划器进行基准测试，在这些任务中，它展示了相对于独立CEM和随机部署的MCTS的性能增益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Boosting MCTS With Free Energy Minimization.

Active inference, grounded in the free energy principle, provides a powerful lens for understanding how agents balance exploration and goal-directed behavior in uncertain environments. Here, we propose a new planning framework that integrates Monte Carlo tree search (MCTS) with active inference objectives to systematically reduce epistemic uncertainty while pursuing extrinsic rewards. Our key insight is that MCTS, already renowned for its search efficiency, can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain. Concretely, the cross-entropy method (CEM) is used to optimize action proposals at the root node, while tree expansions leverage reward modeling alongside intrinsic exploration bonuses. This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability. Empirically, we benchmark our planner on a diverse set of continuous control tasks, where it demonstrates performance gains over both stand-alone CEM and MCTS with random rollouts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.