Asymmetric Graph Error Control With Low Complexity in Causal Bandits

IF 4.6 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Signal Processing Pub Date : 2025-04-24 DOI:10.1109/TSP.2025.3564303

Chen Peng;Di Zhang;Urbashi Mitra

{"title":"Asymmetric Graph Error Control With Low Complexity in Causal Bandits","authors":"Chen Peng;Di Zhang;Urbashi Mitra","doi":"10.1109/TSP.2025.3564303","DOIUrl":null,"url":null,"abstract":"In this paper, the causal bandit problem is investigated, with the objective of maximizing the long-term reward by selecting an optimal sequence of interventions on nodes in an unknown causal graph. It is assumed that both the causal topology and the distribution of interventions are unknown. First, based on the difference between the two types of graph identification errors (false positives and negatives), a causal graph learning method is proposed. Numerical results suggest that this method has a much lower sample complexity relative to the prior art by learning <italic>sub-graphs</i>. However, we note that a sample complexity analysis for the new algorithm has not been undertaken, as of yet. Under the assumption of minimum-mean squared error weight estimation, a new uncertainty bound tailored to the causal bandit problem is derived. This uncertainty bound drives an upper confidence bound-based intervention selection to optimize the reward. Further, we consider a particular instance of non-stationary bandits wherein both the causal topology and interventional distributions can change. Our solution is the design of a sub-graph change detection mechanism that requires a modest number of samples. Numerical results compare the new methodology to existing schemes and show a substantial performance improvement in stationary and non-stationary settings. Averaged over 100 randomly generated causal bandits, the proposed scheme takes significantly fewer samples to learn the causal structure and achieves a reward gain of 85% compared to existing approaches.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"1792-1807"},"PeriodicalIF":4.6000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10976570/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, the causal bandit problem is investigated, with the objective of maximizing the long-term reward by selecting an optimal sequence of interventions on nodes in an unknown causal graph. It is assumed that both the causal topology and the distribution of interventions are unknown. First, based on the difference between the two types of graph identification errors (false positives and negatives), a causal graph learning method is proposed. Numerical results suggest that this method has a much lower sample complexity relative to the prior art by learning sub-graphs. However, we note that a sample complexity analysis for the new algorithm has not been undertaken, as of yet. Under the assumption of minimum-mean squared error weight estimation, a new uncertainty bound tailored to the causal bandit problem is derived. This uncertainty bound drives an upper confidence bound-based intervention selection to optimize the reward. Further, we consider a particular instance of non-stationary bandits wherein both the causal topology and interventional distributions can change. Our solution is the design of a sub-graph change detection mechanism that requires a modest number of samples. Numerical results compare the new methodology to existing schemes and show a substantial performance improvement in stationary and non-stationary settings. Averaged over 100 randomly generated causal bandits, the proposed scheme takes significantly fewer samples to learn the causal structure and achieves a reward gain of 85% compared to existing approaches.

查看原文本刊更多论文

低复杂度的非对称图错误控制

本文研究了因果强盗问题，其目标是通过在未知因果图的节点上选择最优的干预序列来最大化长期回报。假设因果拓扑和干预的分布都是未知的。首先，基于两类图识别误差（假阳性和假阴性）的差异，提出了一种因果图学习方法。数值结果表明，该方法相对于通过学习子图的现有技术具有更低的样本复杂度。然而，我们注意到，新算法的样本复杂度分析尚未进行。在最小均方误差权值估计的假设下，导出了适合于因果强盗问题的新的不确定性界。这种不确定性边界驱动基于上置信度边界的干预选择以优化回报。此外，我们考虑了一个特殊的非平稳土匪实例，其中因果拓扑和干预分布都可以改变。我们的解决方案是设计一种子图变化检测机制，该机制需要适度数量的样本。数值结果将新方法与现有方案进行了比较，并在平稳和非平稳设置下显示出实质性的性能改进。平均超过100个随机生成的因果强盗，与现有方法相比，该方案需要更少的样本来学习因果结构，并实现85%的奖励增益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Signal Processing 工程技术-工程：电子与电气

CiteScore

11.20

自引率

9.30%

发文量

310

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.