{"title":"基于弱先验知识的大规模层次因果发现","authors":"Xiangyu Wang;Taiyu Ban;Lyuzhou Chen;Derui Lyu;Qinrui Zhu;Huanhuan Chen","doi":"10.1109/TKDE.2025.3537832","DOIUrl":null,"url":null,"abstract":"Causal discovery faces significant challenges as the number of hypotheses grows exponentially with the number of variables. This complexity becomes particularly daunting when dealing with large sets of variables. We introduce a novel divide-and-conquer method that uniquely handles this challenge. The existing division strategies often rely on conditional independency (CI) tests or data-driven clustering to split variables, which can suffer from the typical data scarcity in large-scale settings, thus leading to inaccurate division results. The proposed method overcomes this by implementing a data-independent division strategy, which constructs a prior structure, informed by potential causal relationships identified using a Large Language Model (LLM), to guide recursively dividing variables into sub-sets. This approach avoids the impact of data insufficiency and is robust against potential incompleteness in the prior structure. In the merging phase, we adopt a score-based refinement strategy to address fake causal links caused by hidden variables in sub-sets, which eliminates edges in the intersected parts of sub-sets to optimize the score of local structures. While maintaining both correctness and completeness under the faithfulness assumption, this novel merging approach demonstrates enhanced performance than the conventional CI-test based merging strategy in practical scenarios. Empirical evaluations on various large-scale datasets demonstrate the proposed approach's superior accuracy and efficiency compared to existing causal discovery methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2695-2711"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large-Scale Hierarchical Causal Discovery via Weak Prior Knowledge\",\"authors\":\"Xiangyu Wang;Taiyu Ban;Lyuzhou Chen;Derui Lyu;Qinrui Zhu;Huanhuan Chen\",\"doi\":\"10.1109/TKDE.2025.3537832\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Causal discovery faces significant challenges as the number of hypotheses grows exponentially with the number of variables. This complexity becomes particularly daunting when dealing with large sets of variables. We introduce a novel divide-and-conquer method that uniquely handles this challenge. The existing division strategies often rely on conditional independency (CI) tests or data-driven clustering to split variables, which can suffer from the typical data scarcity in large-scale settings, thus leading to inaccurate division results. The proposed method overcomes this by implementing a data-independent division strategy, which constructs a prior structure, informed by potential causal relationships identified using a Large Language Model (LLM), to guide recursively dividing variables into sub-sets. This approach avoids the impact of data insufficiency and is robust against potential incompleteness in the prior structure. In the merging phase, we adopt a score-based refinement strategy to address fake causal links caused by hidden variables in sub-sets, which eliminates edges in the intersected parts of sub-sets to optimize the score of local structures. While maintaining both correctness and completeness under the faithfulness assumption, this novel merging approach demonstrates enhanced performance than the conventional CI-test based merging strategy in practical scenarios. Empirical evaluations on various large-scale datasets demonstrate the proposed approach's superior accuracy and efficiency compared to existing causal discovery methods.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 5\",\"pages\":\"2695-2711\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10872923/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10872923/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Large-Scale Hierarchical Causal Discovery via Weak Prior Knowledge
Causal discovery faces significant challenges as the number of hypotheses grows exponentially with the number of variables. This complexity becomes particularly daunting when dealing with large sets of variables. We introduce a novel divide-and-conquer method that uniquely handles this challenge. The existing division strategies often rely on conditional independency (CI) tests or data-driven clustering to split variables, which can suffer from the typical data scarcity in large-scale settings, thus leading to inaccurate division results. The proposed method overcomes this by implementing a data-independent division strategy, which constructs a prior structure, informed by potential causal relationships identified using a Large Language Model (LLM), to guide recursively dividing variables into sub-sets. This approach avoids the impact of data insufficiency and is robust against potential incompleteness in the prior structure. In the merging phase, we adopt a score-based refinement strategy to address fake causal links caused by hidden variables in sub-sets, which eliminates edges in the intersected parts of sub-sets to optimize the score of local structures. While maintaining both correctness and completeness under the faithfulness assumption, this novel merging approach demonstrates enhanced performance than the conventional CI-test based merging strategy in practical scenarios. Empirical evaluations on various large-scale datasets demonstrate the proposed approach's superior accuracy and efficiency compared to existing causal discovery methods.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.