Hierarchical Causal Discovery From Large-Scale Observed Variables

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-02-06 DOI:10.1109/TKDE.2025.3539788

Rujia Shen;Muhan Li;Chao Zhao;Boran Wang;Yi Guan;Jie Liu;Jingchi Jiang

{"title":"Hierarchical Causal Discovery From Large-Scale Observed Variables","authors":"Rujia Shen;Muhan Li;Chao Zhao;Boran Wang;Yi Guan;Jie Liu;Jingchi Jiang","doi":"10.1109/TKDE.2025.3539788","DOIUrl":null,"url":null,"abstract":"It is a long-standing question to discover causal relations from observed variables in many empirical sciences. However, current causal discovery methods are inefficient when dealing with large-scale observed variables due to challenges in conditional independence (CI) tests or complex computations of acyclicity, and may even fail altogether. To address the efficiency issue in causal discovery from large-scale observed variables, we propose a Hierarchical Causal Discovery (HCD) framework with a bilevel policy that handles this issue by boosting existing models. Specifically, the high-level policy first finds a causal cut set to partition observed variables into several causal clusters and releases the clusters to the low-level policy. The low-level policy applies any causal discovery method to process these causal clusters in parallel and obtain intra-cluster structures for subsequently inter-cluster structure merging in the high-level policy. To avoid missing inter-cluster edges, we theoretically demonstrate the feasibility of causal cluster cut and inter-cluster structure merging. We also prove the completeness and correctness of HCD for causal discovery. Experiments on both synthetic and real-world datasets demonstrate that HCD consistently and significantly enhances the efficiency and effectiveness of existing advanced methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2626-2639"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10877758/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

It is a long-standing question to discover causal relations from observed variables in many empirical sciences. However, current causal discovery methods are inefficient when dealing with large-scale observed variables due to challenges in conditional independence (CI) tests or complex computations of acyclicity, and may even fail altogether. To address the efficiency issue in causal discovery from large-scale observed variables, we propose a Hierarchical Causal Discovery (HCD) framework with a bilevel policy that handles this issue by boosting existing models. Specifically, the high-level policy first finds a causal cut set to partition observed variables into several causal clusters and releases the clusters to the low-level policy. The low-level policy applies any causal discovery method to process these causal clusters in parallel and obtain intra-cluster structures for subsequently inter-cluster structure merging in the high-level policy. To avoid missing inter-cluster edges, we theoretically demonstrate the feasibility of causal cluster cut and inter-cluster structure merging. We also prove the completeness and correctness of HCD for causal discovery. Experiments on both synthetic and real-world datasets demonstrate that HCD consistently and significantly enhances the efficiency and effectiveness of existing advanced methods.

查看原文本刊更多论文

从大规模观测变量中发现层次因果关系

从观察到的变量中发现因果关系在许多经验科学中是一个长期存在的问题。然而，由于条件独立性（CI）测试或复杂的非周期性计算的挑战，当前的因果发现方法在处理大规模观测变量时效率低下，甚至可能完全失败。为了解决从大规模观察变量中发现因果关系的效率问题，我们提出了一个层次因果关系发现（HCD）框架，该框架具有一个双层策略，通过增强现有模型来处理这个问题。具体来说，高级策略首先找到一个因果切割集，将观察到的变量划分为几个因果聚类，并将聚类释放给低级策略。低级策略应用任何因果发现方法并行处理这些因果集群，并获得集群内结构，以便随后在高级策略中合并集群间结构。为了避免簇间边缘缺失，我们从理论上论证了因果聚类切割和簇间结构合并的可行性。我们还证明了因果发现的HCD的完备性和正确性。在合成数据集和实际数据集上的实验表明，HCD一致且显著地提高了现有先进方法的效率和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.