{"title":"Asymmetric Graph Error Control With Low Complexity in Causal Bandits","authors":"Chen Peng;Di Zhang;Urbashi Mitra","doi":"10.1109/TSP.2025.3564303","DOIUrl":null,"url":null,"abstract":"In this paper, the causal bandit problem is investigated, with the objective of maximizing the long-term reward by selecting an optimal sequence of interventions on nodes in an unknown causal graph. It is assumed that both the causal topology and the distribution of interventions are unknown. First, based on the difference between the two types of graph identification errors (false positives and negatives), a causal graph learning method is proposed. Numerical results suggest that this method has a much lower sample complexity relative to the prior art by learning <italic>sub-graphs</i>. However, we note that a sample complexity analysis for the new algorithm has not been undertaken, as of yet. Under the assumption of minimum-mean squared error weight estimation, a new uncertainty bound tailored to the causal bandit problem is derived. This uncertainty bound drives an upper confidence bound-based intervention selection to optimize the reward. Further, we consider a particular instance of non-stationary bandits wherein both the causal topology and interventional distributions can change. Our solution is the design of a sub-graph change detection mechanism that requires a modest number of samples. Numerical results compare the new methodology to existing schemes and show a substantial performance improvement in stationary and non-stationary settings. Averaged over 100 randomly generated causal bandits, the proposed scheme takes significantly fewer samples to learn the causal structure and achieves a reward gain of 85% compared to existing approaches.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"1792-1807"},"PeriodicalIF":4.6000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10976570/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, the causal bandit problem is investigated, with the objective of maximizing the long-term reward by selecting an optimal sequence of interventions on nodes in an unknown causal graph. It is assumed that both the causal topology and the distribution of interventions are unknown. First, based on the difference between the two types of graph identification errors (false positives and negatives), a causal graph learning method is proposed. Numerical results suggest that this method has a much lower sample complexity relative to the prior art by learning sub-graphs. However, we note that a sample complexity analysis for the new algorithm has not been undertaken, as of yet. Under the assumption of minimum-mean squared error weight estimation, a new uncertainty bound tailored to the causal bandit problem is derived. This uncertainty bound drives an upper confidence bound-based intervention selection to optimize the reward. Further, we consider a particular instance of non-stationary bandits wherein both the causal topology and interventional distributions can change. Our solution is the design of a sub-graph change detection mechanism that requires a modest number of samples. Numerical results compare the new methodology to existing schemes and show a substantial performance improvement in stationary and non-stationary settings. Averaged over 100 randomly generated causal bandits, the proposed scheme takes significantly fewer samples to learn the causal structure and achieves a reward gain of 85% compared to existing approaches.
期刊介绍:
The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.