Discounted fully probabilistic design of decision rules

IF 8.1 1区 计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS
Miroslav Kárný, Soňa Molnárová
{"title":"Discounted fully probabilistic design of decision rules","authors":"Miroslav Kárný,&nbsp;Soňa Molnárová","doi":"10.1016/j.ins.2024.121578","DOIUrl":null,"url":null,"abstract":"<div><div>Axiomatic fully probabilistic design (FPD) of optimal decision rules strictly extends the decision making (DM) theory represented by Markov decision processes (MDP). This means that any MDP task can be approximated by an explicitly found FPD task whereas many FPD tasks have no MDP equivalent. MDP and FPD model the closed loop — the coupling of an agent and its environment — via a joint probability density (pd) relating the involved random variables, referred to as behaviour. Unlike MDP, FPD quantifies agent's aims and constraints by an <em>ideal pd</em>. The ideal pd is high on the desired behaviours, small on undesired behaviours and zero on forbidden ones. FPD selects the optimal decision rules as the minimiser of Kullback-Leibler's divergence of the closed-loop-modelling pd to its ideal twin. The proximity measure choice follows from the FPD axiomatics.</div><div>MDP minimises the expected total loss, which is usually the sum of discounted partial losses. The discounting reflects the decreasing importance of future losses. It also diminishes the influence of errors caused by:</div><div><figure><img></figure> the imperfection of the employed environment model;</div><div><figure><img></figure> roughly-expressed aims;</div><div><figure><img></figure> the approximate learning and decision-rules design.</div><div>The established FPD cannot currently account for these important features. The paper elaborates the missing discounted version of FPD. This non-trivial filling of the gap in FPD also employs an extension of dynamic programming, which is of an independent interest.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"690 ","pages":"Article 121578"},"PeriodicalIF":8.1000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524014920","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Axiomatic fully probabilistic design (FPD) of optimal decision rules strictly extends the decision making (DM) theory represented by Markov decision processes (MDP). This means that any MDP task can be approximated by an explicitly found FPD task whereas many FPD tasks have no MDP equivalent. MDP and FPD model the closed loop — the coupling of an agent and its environment — via a joint probability density (pd) relating the involved random variables, referred to as behaviour. Unlike MDP, FPD quantifies agent's aims and constraints by an ideal pd. The ideal pd is high on the desired behaviours, small on undesired behaviours and zero on forbidden ones. FPD selects the optimal decision rules as the minimiser of Kullback-Leibler's divergence of the closed-loop-modelling pd to its ideal twin. The proximity measure choice follows from the FPD axiomatics.
MDP minimises the expected total loss, which is usually the sum of discounted partial losses. The discounting reflects the decreasing importance of future losses. It also diminishes the influence of errors caused by:
the imperfection of the employed environment model;
roughly-expressed aims;
the approximate learning and decision-rules design.
The established FPD cannot currently account for these important features. The paper elaborates the missing discounted version of FPD. This non-trivial filling of the gap in FPD also employs an extension of dynamic programming, which is of an independent interest.
决策规则的全概率贴现设计
最优决策规则的公理全概率设计(FPD)严格扩展了马尔可夫决策过程(MDP)所代表的决策(DM)理论。这意味着任何 MDP 任务都可以用明确找到的 FPD 任务来近似,而许多 FPD 任务却没有与 MDP 相对应的任务。马尔可夫决策过程和 FPD 通过相关随机变量的联合概率密度 (pd) 对闭环(即代理与其环境的耦合)进行建模,并将其称为行为。与 MDP 不同,FPD 通过理想 pd 量化代理的目标和约束。理想 pd 在期望行为上为高,在不期望行为上为小,在禁止行为上为零。FPD 根据闭环建模 pd 与理想 pd 的库尔巴克-莱伯勒发散值的最小值来选择最优决策规则。MDP 最小化预期总损失,通常是折现部分损失之和。贴现反映了未来损失重要性的递减。它还能减少以下因素造成的误差:所使用环境模型的不完善;目标表达粗糙;近似学习和决策规则设计。本文阐述了 FPD 的缺失折扣版本。对 FPD 缺陷的这一非同小可的填补,还采用了动态编程的扩展,这也是本文的另一个关注点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Sciences
Information Sciences 工程技术-计算机:信息系统
CiteScore
14.00
自引率
17.30%
发文量
1322
审稿时长
10.4 months
期刊介绍: Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信