{"title":"LLM-Driven Causal Discovery via Harmonized Prior","authors":"Taiyu Ban;Lyuzhou Chen;Derui Lyu;Xiangyu Wang;Qinrui Zhu;Huanhuan Chen","doi":"10.1109/TKDE.2025.3528461","DOIUrl":null,"url":null,"abstract":"Traditional domain-specific causal discovery relies on expert knowledge to guide the data-based structure learning process, thereby improving the reliability of recovered causality. Recent studies have shown promise in using the Large Language Model (LLM) as causal experts to construct autonomous expert-guided causal discovery systems through causal reasoning between pairwise variables. However, their performance is hampered by inaccuracies in aligning LLM-derived causal knowledge with the actual causal structure. To address this issue, this paper proposes a novel LLM-driven causal discovery framework that limits LLM’s prior within a reliable range. Instead of pairwise causal reasoning that requires both precise and comprehensive output results, the LLM is directed to focus on each single aspect separately. By combining these distinct causal insights, a unified set of structural constraints is created, termed a harmonized prior, which draws on their respective strengths to ensure prior accuracy. On this basis, we introduce plug-and-play integrations of the harmonized prior into mainstream categories of structure learning methods, thereby enhancing their applicability in practical scenarios. Evaluations on real-world data demonstrate the effectiveness of our approach.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 4","pages":"1943-1960"},"PeriodicalIF":8.9000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10839116/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Traditional domain-specific causal discovery relies on expert knowledge to guide the data-based structure learning process, thereby improving the reliability of recovered causality. Recent studies have shown promise in using the Large Language Model (LLM) as causal experts to construct autonomous expert-guided causal discovery systems through causal reasoning between pairwise variables. However, their performance is hampered by inaccuracies in aligning LLM-derived causal knowledge with the actual causal structure. To address this issue, this paper proposes a novel LLM-driven causal discovery framework that limits LLM’s prior within a reliable range. Instead of pairwise causal reasoning that requires both precise and comprehensive output results, the LLM is directed to focus on each single aspect separately. By combining these distinct causal insights, a unified set of structural constraints is created, termed a harmonized prior, which draws on their respective strengths to ensure prior accuracy. On this basis, we introduce plug-and-play integrations of the harmonized prior into mainstream categories of structure learning methods, thereby enhancing their applicability in practical scenarios. Evaluations on real-world data demonstrate the effectiveness of our approach.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.