变量排序对贝叶斯网络结构学习的影响

IF 2.8 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data Mining and Knowledge Discovery Pub Date : 2024-06-08 DOI:10.1007/s10618-024-01044-9

Neville K. Kitson, Anthony C. Constantinou

{"title":"变量排序对贝叶斯网络结构学习的影响","authors":"Neville K. Kitson, Anthony C. Constantinou","doi":"10.1007/s10618-024-01044-9","DOIUrl":null,"url":null,"abstract":"<p>Causal Bayesian Networks (CBNs) provide an important tool for reasoning under uncertainty with potential application to many complex causal systems. Structure learning algorithms that can tell us something about the causal structure of these systems are becoming increasingly important. In the literature, the validity of these algorithms is often tested for sensitivity over varying sample sizes, hyper-parameters, and occasionally objective functions, but the effect of the order in which the variables are read from data is rarely quantified. We show that many commonly-used algorithms, both established and state-of-the-art, are more sensitive to variable ordering than these other factors when learning CBNs from discrete variables. This effect is strongest in hill-climbing and its variants where we explain how it arises, but extends to hybrid, and to a lesser-extent, constraint-based algorithms. Because the variable ordering is arbitrary, any significant effect it has on learnt graph accuracy is concerning, and raises questions about the validity of both many older and more recent results produced by these algorithms in practical applications and their rankings in performance evaluations.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"44 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The impact of variable ordering on Bayesian network structure learning\",\"authors\":\"Neville K. Kitson, Anthony C. Constantinou\",\"doi\":\"10.1007/s10618-024-01044-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Causal Bayesian Networks (CBNs) provide an important tool for reasoning under uncertainty with potential application to many complex causal systems. Structure learning algorithms that can tell us something about the causal structure of these systems are becoming increasingly important. In the literature, the validity of these algorithms is often tested for sensitivity over varying sample sizes, hyper-parameters, and occasionally objective functions, but the effect of the order in which the variables are read from data is rarely quantified. We show that many commonly-used algorithms, both established and state-of-the-art, are more sensitive to variable ordering than these other factors when learning CBNs from discrete variables. This effect is strongest in hill-climbing and its variants where we explain how it arises, but extends to hybrid, and to a lesser-extent, constraint-based algorithms. Because the variable ordering is arbitrary, any significant effect it has on learnt graph accuracy is concerning, and raises questions about the validity of both many older and more recent results produced by these algorithms in practical applications and their rankings in performance evaluations.</p>\",\"PeriodicalId\":55183,\"journal\":{\"name\":\"Data Mining and Knowledge Discovery\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Mining and Knowledge Discovery\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10618-024-01044-9\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Mining and Knowledge Discovery","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10618-024-01044-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

因果贝叶斯网络（CBN）为不确定条件下的推理提供了一个重要工具，可应用于许多复杂的因果系统。能告诉我们这些系统因果结构的结构学习算法正变得越来越重要。在文献中，这些算法的有效性经常在不同的样本大小、超参数以及目标函数中进行敏感性测试，但从数据中读取变量的顺序所产生的影响却很少被量化。我们的研究表明，在从离散变量学习 CBN 时，许多常用算法（包括成熟算法和最新算法）对变量排序的敏感度要高于其他因素。这种影响在爬山算法及其变体中最为明显，我们解释了这种影响是如何产生的，但这种影响也延伸到了混合算法中，并在较小程度上延伸到了基于约束的算法中。由于变量排序是任意的，因此它对学习图准确性的任何显著影响都是令人担忧的，同时也对这些算法在实际应用中产生的许多较早和较新结果的有效性及其在性能评估中的排名提出了质疑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

The impact of variable ordering on Bayesian network structure learning

查看原文本刊更多论文

The impact of variable ordering on Bayesian network structure learning

Causal Bayesian Networks (CBNs) provide an important tool for reasoning under uncertainty with potential application to many complex causal systems. Structure learning algorithms that can tell us something about the causal structure of these systems are becoming increasingly important. In the literature, the validity of these algorithms is often tested for sensitivity over varying sample sizes, hyper-parameters, and occasionally objective functions, but the effect of the order in which the variables are read from data is rarely quantified. We show that many commonly-used algorithms, both established and state-of-the-art, are more sensitive to variable ordering than these other factors when learning CBNs from discrete variables. This effect is strongest in hill-climbing and its variants where we explain how it arises, but extends to hybrid, and to a lesser-extent, constraint-based algorithms. Because the variable ordering is arbitrary, any significant effect it has on learnt graph accuracy is concerning, and raises questions about the validity of both many older and more recent results produced by these algorithms in practical applications and their rankings in performance evaluations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Data Mining and Knowledge Discovery 工程技术-计算机：人工智能

CiteScore

10.40

自引率

4.20%

发文量

审稿时长

10 months

期刊介绍： Advances in data gathering, storage, and distribution have created a need for computational tools and techniques to aid in data analysis. Data Mining and Knowledge Discovery in Databases (KDD) is a rapidly growing area of research and application that builds on techniques and theories from many fields, including statistics, databases, pattern recognition and learning, data visualization, uncertainty modelling, data warehousing and OLAP, optimization, and high performance computing.