Robust Causal Bandits for Linear Models

Zirui Yan;Arpan Mukherjee;Burak Varıcı;Ali Tajer
{"title":"Robust Causal Bandits for Linear Models","authors":"Zirui Yan;Arpan Mukherjee;Burak Varıcı;Ali Tajer","doi":"10.1109/JSAIT.2024.3373595","DOIUrl":null,"url":null,"abstract":"The sequential design of experiments for optimizing a reward function in causal systems can be effectively modeled by the sequential design of interventions in causal bandits (CBs). In the existing literature on CBs, a critical assumption is that the causal models remain constant over time. However, this assumption does not necessarily hold in complex systems, which constantly undergo temporal model fluctuations. This paper addresses the robustness of CBs to such model fluctuations. The focus is on causal systems with linear structural equation models (SEMs). The SEMs and the time-varying pre- and post-interventional statistical models are all unknown. Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations. First, it is established that the existing approaches fail to maintain regret sub-linearity with even a few instances of model deviation. Specifically, when the number of instances with model deviation is as few as \n<inline-formula> <tex-math>$T^{\\frac {1}{2L}}$ </tex-math></inline-formula>\n, where \n<inline-formula> <tex-math>$T$ </tex-math></inline-formula>\n is the time horizon and \n<inline-formula> <tex-math>$L$ </tex-math></inline-formula>\n is the length of the longest causal path in the graph, the existing algorithms will have linear regret in \n<inline-formula> <tex-math>$T$ </tex-math></inline-formula>\n. For instance, when \n<inline-formula> <tex-math>$T=10^{5}$ </tex-math></inline-formula>\n and \n<inline-formula> <tex-math>$L=3$ </tex-math></inline-formula>\n, model deviations in 6 out of 105 instances result in a linear regret. Next, a robust CB algorithm is designed, and its regret is analyzed, where upper and information-theoretic lower bounds on the regret are established. Specifically, in a graph with \n<inline-formula> <tex-math>$N$ </tex-math></inline-formula>\n nodes and maximum degree \n<inline-formula> <tex-math>$d$ </tex-math></inline-formula>\n, under a general measure of model deviation \n<inline-formula> <tex-math>$C$ </tex-math></inline-formula>\n, the cumulative regret is upper bounded by \n<inline-formula> <tex-math>$\\tilde {\\mathcal {O}}\\left({d^{L-{}\\frac {1}{2}}(\\sqrt {NT} + NC)}\\right)$ </tex-math></inline-formula>\n and lower bounded by \n<inline-formula> <tex-math>$\\Omega \\left({d^{\\frac {L}{2}-2}\\max \\{\\sqrt {T}\\;, \\; d^{2}C\\}}\\right)$ </tex-math></inline-formula>\n. Comparing these bounds establishes that the proposed algorithm achieves nearly optimal \n<inline-formula> <tex-math>$\\tilde{\\mathcal {O}} (\\sqrt {T})$ </tex-math></inline-formula>\n regret when \n<inline-formula> <tex-math>$C$ </tex-math></inline-formula>\n is \n<inline-formula> <tex-math>$o(\\sqrt {T})$ </tex-math></inline-formula>\n and maintains sub-linear regret for a broader range of \n<inline-formula> <tex-math>$C$ </tex-math></inline-formula>\n.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"78-93"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in information theory","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10460990/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The sequential design of experiments for optimizing a reward function in causal systems can be effectively modeled by the sequential design of interventions in causal bandits (CBs). In the existing literature on CBs, a critical assumption is that the causal models remain constant over time. However, this assumption does not necessarily hold in complex systems, which constantly undergo temporal model fluctuations. This paper addresses the robustness of CBs to such model fluctuations. The focus is on causal systems with linear structural equation models (SEMs). The SEMs and the time-varying pre- and post-interventional statistical models are all unknown. Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations. First, it is established that the existing approaches fail to maintain regret sub-linearity with even a few instances of model deviation. Specifically, when the number of instances with model deviation is as few as $T^{\frac {1}{2L}}$ , where $T$ is the time horizon and $L$ is the length of the longest causal path in the graph, the existing algorithms will have linear regret in $T$ . For instance, when $T=10^{5}$ and $L=3$ , model deviations in 6 out of 105 instances result in a linear regret. Next, a robust CB algorithm is designed, and its regret is analyzed, where upper and information-theoretic lower bounds on the regret are established. Specifically, in a graph with $N$ nodes and maximum degree $d$ , under a general measure of model deviation $C$ , the cumulative regret is upper bounded by $\tilde {\mathcal {O}}\left({d^{L-{}\frac {1}{2}}(\sqrt {NT} + NC)}\right)$ and lower bounded by $\Omega \left({d^{\frac {L}{2}-2}\max \{\sqrt {T}\;, \; d^{2}C\}}\right)$ . Comparing these bounds establishes that the proposed algorithm achieves nearly optimal $\tilde{\mathcal {O}} (\sqrt {T})$ regret when $C$ is $o(\sqrt {T})$ and maintains sub-linear regret for a broader range of $C$ .
线性模型的稳健因果匪帮
在因果系统中,优化奖励函数的实验顺序设计可以通过因果匪帮(CBs)中干预措施的顺序设计进行有效建模。在现有的因果匪帮文献中,一个关键的假设是因果模型随时间保持不变。然而,这一假设在复杂系统中并不一定成立,因为复杂系统会不断发生时间模型波动。本文探讨了 CB 对这种模型波动的稳健性。重点是具有线性结构方程模型(SEM)的因果系统。SEM 和时变的干预前后统计模型都是未知的。采用累积遗憾作为设计标准,其目的是设计一系列干预措施,使其对了解整个因果模型及其波动的甲骨文产生的累积遗憾最小。首先,我们发现现有的方法即使在模型出现少量偏差的情况下也无法保持遗憾的亚线性。具体来说,当模型偏差实例的数量少至 $T^{\frac {1}{2L}}$ 时,其中 $T$ 是时间跨度,$L$ 是图中最长因果路径的长度,现有算法将在 $T$ 中具有线性遗憾。例如,当 $T=10^{5}$ 和 $L=3$ 时,105 个实例中有 6 个的模型偏差会导致线性遗憾。接下来,我们设计了一种稳健的 CB 算法,并对其遗憾值进行了分析,确定了遗憾值的上限和信息论下限。具体来说,在一个节点数为 $N$、最大度数为 $d$ 的图中,在模型偏差的一般度量 $C$ 、累积遗憾值的上限为 $\tilde {\mathcal {O}}\left({d^{L-{}frac {1}{2}}(\sqrt {NT} + NC)}\right)$ ,下限为 $\Omega \left({d^{frac {L}{2}-2}max \{\sqrt {T}\;, d^{2}C\}\right)$ 。比较这些界限可以确定,所提出的算法几乎达到了最优的 $\tilde{mathcal {O}} 。(\sqrt {T})$ 当 $C$ 为 $o(\sqrt {T})$ 时的遗憾值,并在更宽的 $C$ 范围内保持亚线性遗憾值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.20
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信