Robust Q-learning algorithm for Markov decision processes under Wasserstein uncertainty

IF 4.8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Automatica Pub Date : 2024-08-02 DOI:10.1016/j.automatica.2024.111825

引用次数: 0

Abstract

We present a novel $Q$ -learning algorithm tailored to solve distributionally robust Markov decision problems where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball around a (possibly estimated) reference measure. We prove convergence of the presented algorithm and provide several examples also using real data to illustrate both the tractability of our algorithm as well as the benefits of considering distributional robustness when solving stochastic optimal control problems, in particular when the estimated distributions turn out to be misspecified in practice.

查看原文本刊更多论文

瓦瑟斯坦不确定性条件下马尔可夫决策过程的鲁棒 Q-learning 算法

我们提出了一种新颖的 Q-learning 算法，专门用于解决分布稳健性马尔可夫决策问题，在这种问题中，底层马尔可夫决策过程的过渡概率的相应模糊集是一个围绕（可能是估计的）参考度量的 Wasserstein 球。我们证明了所提出算法的收敛性，并提供了几个同样使用真实数据的示例，以说明我们算法的可操作性，以及在解决随机最优控制问题时考虑分布鲁棒性的好处，特别是当估计的分布在实践中被错误地指定时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automatica 工程技术-工程：电子与电气

CiteScore

10.70

自引率

7.80%

发文量

617

审稿时长

5 months

期刊介绍： Automatica is a leading archival publication in the field of systems and control. The field encompasses today a broad set of areas and topics, and is thriving not only within itself but also in terms of its impact on other fields, such as communications, computers, biology, energy and economics. Since its inception in 1963, Automatica has kept abreast with the evolution of the field over the years, and has emerged as a leading publication driving the trends in the field. After being founded in 1963, Automatica became a journal of the International Federation of Automatic Control (IFAC) in 1969. It features a characteristic blend of theoretical and applied papers of archival, lasting value, reporting cutting edge research results by authors across the globe. It features articles in distinct categories, including regular, brief and survey papers, technical communiqués, correspondence items, as well as reviews on published books of interest to the readership. It occasionally publishes special issues on emerging new topics or established mature topics of interest to a broad audience. Automatica solicits original high-quality contributions in all the categories listed above, and in all areas of systems and control interpreted in a broad sense and evolving constantly. They may be submitted directly to a subject editor or to the Editor-in-Chief if not sure about the subject area. Editorial procedures in place assure careful, fair, and prompt handling of all submitted articles. Accepted papers appear in the journal in the shortest time feasible given production time constraints.

文献相关原料

公司名称	产品信息	采购帮参考价格