用多步树搜索学习非平稳环境的隐跃迁

IF 8.7 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS
Yangqing Fu;Yue Gao
{"title":"用多步树搜索学习非平稳环境的隐跃迁","authors":"Yangqing Fu;Yue Gao","doi":"10.1109/TSMC.2025.3578730","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning (DRL) algorithms have shown impressive results in various applications, but nonstationary environments, such as varying operating conditions and external disturbances, remain a significant challenge. To address this challenge, we propose the hidden transition inference (HTI) framework for learning nonstationary transitions in multistep tree search. Different from previous methods that focus on single-step transition changes, the HTI framework improves decision-making by inferring multistep environmental variations. Specifically, this framework constructs a probabilistic graphical model for Monte Carlo tree search (MCTS) in latent space and utilizes the variational lower bound of hidden states for policy improvement. Furthermore, this work theoretically proves the convergence of the HTI framework, ensuring its effectiveness in handling nonstationary environments. The proposed framework is integrated with the state-of-the-art MCTS-based algorithm sampled MuZero and evaluated on multiple control tasks with different nonstationary dynamics transitions. Experimental results show that the HTI framework can improve the inference capability of tree search in nonstationary environments, showcasing its potential for addressing the control challenges in nonstationary environments.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7012-7023"},"PeriodicalIF":8.7000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Hidden Transition for Nonstationary Environments With Multistep Tree Search\",\"authors\":\"Yangqing Fu;Yue Gao\",\"doi\":\"10.1109/TSMC.2025.3578730\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning (DRL) algorithms have shown impressive results in various applications, but nonstationary environments, such as varying operating conditions and external disturbances, remain a significant challenge. To address this challenge, we propose the hidden transition inference (HTI) framework for learning nonstationary transitions in multistep tree search. Different from previous methods that focus on single-step transition changes, the HTI framework improves decision-making by inferring multistep environmental variations. Specifically, this framework constructs a probabilistic graphical model for Monte Carlo tree search (MCTS) in latent space and utilizes the variational lower bound of hidden states for policy improvement. Furthermore, this work theoretically proves the convergence of the HTI framework, ensuring its effectiveness in handling nonstationary environments. The proposed framework is integrated with the state-of-the-art MCTS-based algorithm sampled MuZero and evaluated on multiple control tasks with different nonstationary dynamics transitions. Experimental results show that the HTI framework can improve the inference capability of tree search in nonstationary environments, showcasing its potential for addressing the control challenges in nonstationary environments.\",\"PeriodicalId\":48915,\"journal\":{\"name\":\"IEEE Transactions on Systems Man Cybernetics-Systems\",\"volume\":\"55 10\",\"pages\":\"7012-7023\"},\"PeriodicalIF\":8.7000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Systems Man Cybernetics-Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11076163/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11076163/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

深度强化学习(DRL)算法在各种应用中显示出令人印象深刻的结果,但非平稳环境,如不同的操作条件和外部干扰,仍然是一个重大挑战。为了解决这一挑战,我们提出了隐藏过渡推理(HTI)框架,用于学习多步树搜索中的非平稳过渡。与以往的方法不同,HTI框架通过推断多步骤的环境变化来改善决策。具体而言,该框架构建了潜在空间中蒙特卡罗树搜索(MCTS)的概率图模型,并利用隐藏状态的变分下界进行策略改进。此外,该工作从理论上证明了HTI框架的收敛性,确保了其在处理非平稳环境中的有效性。该框架与最先进的基于mcts的采样MuZero算法相结合,并对具有不同非平稳动态转换的多个控制任务进行了评估。实验结果表明,HTI框架可以提高非平稳环境下树搜索的推理能力,显示了其解决非平稳环境下控制挑战的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning Hidden Transition for Nonstationary Environments With Multistep Tree Search
Deep reinforcement learning (DRL) algorithms have shown impressive results in various applications, but nonstationary environments, such as varying operating conditions and external disturbances, remain a significant challenge. To address this challenge, we propose the hidden transition inference (HTI) framework for learning nonstationary transitions in multistep tree search. Different from previous methods that focus on single-step transition changes, the HTI framework improves decision-making by inferring multistep environmental variations. Specifically, this framework constructs a probabilistic graphical model for Monte Carlo tree search (MCTS) in latent space and utilizes the variational lower bound of hidden states for policy improvement. Furthermore, this work theoretically proves the convergence of the HTI framework, ensuring its effectiveness in handling nonstationary environments. The proposed framework is integrated with the state-of-the-art MCTS-based algorithm sampled MuZero and evaluated on multiple control tasks with different nonstationary dynamics transitions. Experimental results show that the HTI framework can improve the inference capability of tree search in nonstationary environments, showcasing its potential for addressing the control challenges in nonstationary environments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Systems Man Cybernetics-Systems
IEEE Transactions on Systems Man Cybernetics-Systems AUTOMATION & CONTROL SYSTEMS-COMPUTER SCIENCE, CYBERNETICS
CiteScore
18.50
自引率
11.50%
发文量
812
审稿时长
6 months
期刊介绍: The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信