Uncertainty Quantification and Exploration for Reinforcement Learning

IF 0.7 4区管理学 Q3 Engineering

Military Operations Research Pub Date : 2019-10-12 DOI:10.1287/opre.2023.2436

Yi Zhu, Jing Dong, H. Lam

{"title":"Uncertainty Quantification and Exploration for Reinforcement Learning","authors":"Yi Zhu, Jing Dong, H. Lam","doi":"10.1287/opre.2023.2436","DOIUrl":null,"url":null,"abstract":"Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"12 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2019-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Military Operations Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1287/opre.2023.2436","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 1

Abstract

Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.

查看原文本刊更多论文

强化学习的不确定性量化与探索

在统计推断中，大样本行为和置信区间构造是评估相对于数据噪声的估计量的误差和可靠性的基础。在论文“不确定性量化和探索强化学习”中，Dong, Lam和Zhu研究了经典强化学习环境下的大样本行为。当从底层马尔可夫链收集数据时，他们推导出适当的大样本渐近分布的状态-作用值函数(q值)和最优值函数估计。这允许人们在不同的决策中评估表现的自信。严格的不确定性量化还通过最大化估计q值(均方差与方差的比值)之间的最坏情况相对差异，促进了纯勘探策略的发展。该探索策略旨在收集信息丰富的训练数据，使学习到最优奖励收集策略的概率最大化，并取得了良好的经验性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Military Operations Research 管理科学-运筹学与管理科学

CiteScore

1.00

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Military Operations Research is a peer-reviewed journal of high academic quality. The Journal publishes articles that describe operations research (OR) methodologies and theories used in key military and national security applications. Of particular interest are papers that present: Case studies showing innovative OR applications Apply OR to major policy issues Introduce interesting new problems areas Highlight education issues Document the history of military and national security OR.