关于一元后向归纳法的正确性

IF 1.1 3区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING
N. Brede, N. Botta
{"title":"关于一元后向归纳法的正确性","authors":"N. Brede, N. Botta","doi":"10.1017/S0956796821000228","DOIUrl":null,"url":null,"abstract":"Abstract In control theory, to solve a finite-horizon sequential decision problem (SDP) commonly means to find a list of decision rules that result in an optimal expected total reward (or cost) when taking a given number of decision steps. SDPs are routinely solved using Bellman’s backward induction. Textbook authors (e.g. Bertsekas or Puterman) typically give more or less formal proofs to show that the backward induction algorithm is correct as solution method for deterministic and stochastic SDPs. Botta, Jansson and Ionescu propose a generic framework for finite horizon, monadic SDPs together with a monadic version of backward induction for solving such SDPs. In monadic SDPs, the monad captures a generic notion of uncertainty, while a generic measure function aggregates rewards. In the present paper, we define a notion of correctness for monadic SDPs and identify three conditions that allow us to prove a correctness result for monadic backward induction that is comparable to textbook correctness proofs for ordinary backward induction. The conditions that we impose are fairly general and can be cast in category-theoretical terms using the notion of Eilenberg–Moore algebra. They hold in familiar settings like those of deterministic or stochastic SDPs, but we also give examples in which they fail. Our results show that backward induction can safely be employed for a broader class of SDPs than usually treated in textbooks. However, they also rule out certain instances that were considered admissible in the context of Botta et al. ’s generic framework. Our development is formalised in Idris as an extension of the Botta et al. framework and the sources are available as supplementary material.","PeriodicalId":15874,"journal":{"name":"Journal of Functional Programming","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2020-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"On the correctness of monadic backward induction\",\"authors\":\"N. Brede, N. Botta\",\"doi\":\"10.1017/S0956796821000228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract In control theory, to solve a finite-horizon sequential decision problem (SDP) commonly means to find a list of decision rules that result in an optimal expected total reward (or cost) when taking a given number of decision steps. SDPs are routinely solved using Bellman’s backward induction. Textbook authors (e.g. Bertsekas or Puterman) typically give more or less formal proofs to show that the backward induction algorithm is correct as solution method for deterministic and stochastic SDPs. Botta, Jansson and Ionescu propose a generic framework for finite horizon, monadic SDPs together with a monadic version of backward induction for solving such SDPs. In monadic SDPs, the monad captures a generic notion of uncertainty, while a generic measure function aggregates rewards. In the present paper, we define a notion of correctness for monadic SDPs and identify three conditions that allow us to prove a correctness result for monadic backward induction that is comparable to textbook correctness proofs for ordinary backward induction. The conditions that we impose are fairly general and can be cast in category-theoretical terms using the notion of Eilenberg–Moore algebra. They hold in familiar settings like those of deterministic or stochastic SDPs, but we also give examples in which they fail. Our results show that backward induction can safely be employed for a broader class of SDPs than usually treated in textbooks. However, they also rule out certain instances that were considered admissible in the context of Botta et al. ’s generic framework. Our development is formalised in Idris as an extension of the Botta et al. framework and the sources are available as supplementary material.\",\"PeriodicalId\":15874,\"journal\":{\"name\":\"Journal of Functional Programming\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2020-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Functional Programming\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1017/S0956796821000228\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Functional Programming","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1017/S0956796821000228","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 3

摘要

摘要在控制理论中,解决有限时域序列决策问题(SDP)通常意味着在采取给定数量的决策步骤时,找到一系列决策规则,这些规则会产生最优的预期总回报(或成本)。SDP通常使用Bellman的反向归纳法求解。教科书作者(例如Bertsekas或Puterman)通常会给出或多或少的形式证明,以表明后向归纳算法作为确定性和随机SDP的求解方法是正确的。Botta、Jansson和Ionescu提出了一个有限时域、一元SDP的通用框架,以及用于求解此类SDP的一元后向归纳版本。在monadic SDP中,monad捕获了不确定性的通用概念,而通用度量函数聚合了奖励。在本文中,我们定义了一元SDP的正确性概念,并确定了三个条件,使我们能够证明一元后向归纳的正确性结果,该结果与普通后向归纳教科书的正确性证明相当。我们强加的条件是相当普遍的,可以使用Eilenberg–Moore代数的概念用范畴理论的术语来描述。它们适用于熟悉的设置,如确定性或随机SDP,但我们也给出了它们失败的例子。我们的结果表明,反向归纳法可以安全地用于比教科书中通常处理的更广泛的SDP类别。然而,他们也排除了在Botta等人的通用框架中被认为是可接受的某些情况。我们的发展在Idris中正式化,作为Botta等人框架的延伸,来源可作为补充材料。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On the correctness of monadic backward induction
Abstract In control theory, to solve a finite-horizon sequential decision problem (SDP) commonly means to find a list of decision rules that result in an optimal expected total reward (or cost) when taking a given number of decision steps. SDPs are routinely solved using Bellman’s backward induction. Textbook authors (e.g. Bertsekas or Puterman) typically give more or less formal proofs to show that the backward induction algorithm is correct as solution method for deterministic and stochastic SDPs. Botta, Jansson and Ionescu propose a generic framework for finite horizon, monadic SDPs together with a monadic version of backward induction for solving such SDPs. In monadic SDPs, the monad captures a generic notion of uncertainty, while a generic measure function aggregates rewards. In the present paper, we define a notion of correctness for monadic SDPs and identify three conditions that allow us to prove a correctness result for monadic backward induction that is comparable to textbook correctness proofs for ordinary backward induction. The conditions that we impose are fairly general and can be cast in category-theoretical terms using the notion of Eilenberg–Moore algebra. They hold in familiar settings like those of deterministic or stochastic SDPs, but we also give examples in which they fail. Our results show that backward induction can safely be employed for a broader class of SDPs than usually treated in textbooks. However, they also rule out certain instances that were considered admissible in the context of Botta et al. ’s generic framework. Our development is formalised in Idris as an extension of the Botta et al. framework and the sources are available as supplementary material.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Functional Programming
Journal of Functional Programming 工程技术-计算机:软件工程
CiteScore
1.70
自引率
0.00%
发文量
9
审稿时长
>12 weeks
期刊介绍: Journal of Functional Programming is the only journal devoted solely to the design, implementation, and application of functional programming languages, spanning the range from mathematical theory to industrial practice. Topics covered include functional languages and extensions, implementation techniques, reasoning and proof, program transformation and synthesis, type systems, type theory, language-based security, memory management, parallelism and applications. The journal is of interest to computer scientists, software engineers, programming language researchers and mathematicians interested in the logical foundations of programming.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信