Causal discovery based on hierarchical reinforcement learning

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-04-04 DOI:10.1016/j.eswa.2025.127466

Jingchi Jiang , Rujia Shen , Chao Zhao , Yi Guan , Xuehui Yu , Xuelian Fu

{"title":"Causal discovery based on hierarchical reinforcement learning","authors":"Jingchi Jiang , Rujia Shen , Chao Zhao , Yi Guan , Xuehui Yu , Xuelian Fu","doi":"10.1016/j.eswa.2025.127466","DOIUrl":null,"url":null,"abstract":"<div><div>Conditional independence (CI) tests in causal discovery can determine a set of Markov equivalence classes w.r.t. the observed data by checking whether each pair of variables is d-separated under faithfulness and Markov assumptions. However, CI tests are intractable for high-dimensional conditional variables. Motivated by the advantages of reinforcement learning in exploring the solution space, firstly, we propose a causal discovery framework based on hierarchical reinforcement learning (CD-HRL). This framework trains both the discovery of the causal skeleton and the identification of direction using two interdependent high-level and low-level policies seperately. Dividing causal discovery into two distinct subtasks to high-level and low-level policies enhances exploration efficiency and minimises error accumulation. The high-level policy iteratively generates causal skeletons as subgoals for instructing the low-level policy, which then identifies causal directions of individual pairs of variables. Secondly, to avoid redundant exploration of familiar causal structures, we incorporate a memory module into the high-level agent and predefine an augmented reward that combines a causal score function and a curiosity item for exploring unknown causal structures. Lastly, experiments on both synthetic and real datasets show that the proposed approach outperforms the state-of-the-art methods under various data-generating procedures, which follow linear, nonlinear, and ordinary differential equations with additive Gaussian noise. The code for our CD-HRL method is available online in <span><span>https://github.com/HITshenrj/CD-HRL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"279 ","pages":"Article 127466"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425010887","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Conditional independence (CI) tests in causal discovery can determine a set of Markov equivalence classes w.r.t. the observed data by checking whether each pair of variables is d-separated under faithfulness and Markov assumptions. However, CI tests are intractable for high-dimensional conditional variables. Motivated by the advantages of reinforcement learning in exploring the solution space, firstly, we propose a causal discovery framework based on hierarchical reinforcement learning (CD-HRL). This framework trains both the discovery of the causal skeleton and the identification of direction using two interdependent high-level and low-level policies seperately. Dividing causal discovery into two distinct subtasks to high-level and low-level policies enhances exploration efficiency and minimises error accumulation. The high-level policy iteratively generates causal skeletons as subgoals for instructing the low-level policy, which then identifies causal directions of individual pairs of variables. Secondly, to avoid redundant exploration of familiar causal structures, we incorporate a memory module into the high-level agent and predefine an augmented reward that combines a causal score function and a curiosity item for exploring unknown causal structures. Lastly, experiments on both synthetic and real datasets show that the proposed approach outperforms the state-of-the-art methods under various data-generating procedures, which follow linear, nonlinear, and ordinary differential equations with additive Gaussian noise. The code for our CD-HRL method is available online in https://github.com/HITshenrj/CD-HRL.

Abstract Image

查看原文本刊更多论文

基于层次强化学习的因果发现

因果发现中的条件独立检验（CI）通过检验每对变量在信度和马尔可夫假设下是否d分离来确定一组马尔可夫等价类。然而，CI测试对于高维条件变量是难以处理的。基于强化学习在探索解空间方面的优势，我们首先提出了一种基于分层强化学习的因果发现框架（CD-HRL）。该框架分别使用两个相互依赖的高级别和低级别政策来训练因果骨架的发现和方向的识别。将因果关系发现划分为高阶和低阶策略两个不同的子任务，可以提高探索效率，减少错误积累。高级策略迭代地生成因果框架，作为指导低级策略的子目标，低级策略随后确定单个变量对的因果方向。其次，为了避免对熟悉的因果结构进行冗余的探索，我们在高级智能体中加入了一个记忆模块，并预定义了一个结合因果评分函数和好奇项的增强奖励，用于探索未知的因果结构。最后，在合成数据集和真实数据集上的实验表明，该方法在各种数据生成过程（包括线性、非线性和加性高斯噪声的常微分方程）下都优于最先进的方法。我们的CD-HRL方法的代码可在https://github.com/HITshenrj/CD-HRL上在线获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.