Uncertainty Quantification and Exploration for Reinforcement Learning

IF 0.7 4区 管理学 Q3 Engineering
Yi Zhu, Jing Dong, H. Lam
{"title":"Uncertainty Quantification and Exploration for Reinforcement Learning","authors":"Yi Zhu, Jing Dong, H. Lam","doi":"10.1287/opre.2023.2436","DOIUrl":null,"url":null,"abstract":"Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"12 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2019-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Military Operations Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1287/opre.2023.2436","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 1

Abstract

Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.
强化学习的不确定性量化与探索
在统计推断中,大样本行为和置信区间构造是评估相对于数据噪声的估计量的误差和可靠性的基础。在论文“不确定性量化和探索强化学习”中,Dong, Lam和Zhu研究了经典强化学习环境下的大样本行为。当从底层马尔可夫链收集数据时,他们推导出适当的大样本渐近分布的状态-作用值函数(q值)和最优值函数估计。这允许人们在不同的决策中评估表现的自信。严格的不确定性量化还通过最大化估计q值(均方差与方差的比值)之间的最坏情况相对差异,促进了纯勘探策略的发展。该探索策略旨在收集信息丰富的训练数据,使学习到最优奖励收集策略的概率最大化,并取得了良好的经验性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Military Operations Research
Military Operations Research 管理科学-运筹学与管理科学
CiteScore
1.00
自引率
0.00%
发文量
0
审稿时长
>12 weeks
期刊介绍: Military Operations Research is a peer-reviewed journal of high academic quality. The Journal publishes articles that describe operations research (OR) methodologies and theories used in key military and national security applications. Of particular interest are papers that present: Case studies showing innovative OR applications Apply OR to major policy issues Introduce interesting new problems areas Highlight education issues Document the history of military and national security OR.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信