Modeling continuous distributions in hybrid Bayesian networks using mixtures of polynomials with tails

IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
J.C. Luengo , D. Ramos-López , R. Rumí
{"title":"Modeling continuous distributions in hybrid Bayesian networks using mixtures of polynomials with tails","authors":"J.C. Luengo ,&nbsp;D. Ramos-López ,&nbsp;R. Rumí","doi":"10.1016/j.csda.2025.108246","DOIUrl":null,"url":null,"abstract":"<div><div>A new approach to modeling continuous distributions in hybrid Bayesian networks (BNs) is presented. It is based on Mixtures of Polynomials (MoPs) with tails, named as tMoPs. This proposal is a variation of the usual MoP model, now including tails and several other improvements in the learning process. The adequate modeling of tails in variable distributions is relevant theoretically and for many reals applications, in which rare phenomena may have a great impact. The proposed approach has been designed to exploit the flexibility of the tMoP model to fit different continuous data distributions. This is especially relevant in those distributions with zones of density close to zero, in which polynomial fitting may be difficult. In these situations, tMoPs allow a polynomial fit in parts with higher density and the use of tails in areas with lower density. This permits a better global fit, without loss of overall accuracy and yielding a relatively simple density function. Learning algorithms for tMoPs conditional probability distributions with up to two parents of any type are developed. These tMoPs may be integrated into hybrid Bayesian networks to represent conditional probability distributions, thus allowing to perform probabilistic reasoning, such as causal inference, sensitivity analysis, and other decision-making operations. The suitability of tMoPs is evaluated in several ways, using a large set of real datasets with data of different natures. The experiments include: the analysis of goodness-of-fit with several continuous and pseudo-continuous variables, the optimization of certain parameters and the effect of variable selection and graph structure when using tMoPs in BNs, and finally the evaluation of the predictive ability of hybrid BNs based on tMoPs in classification and regression. Results show the good behavior of our proposal, with the tMoP hybrid Bayesian networks being equally accurate or outperforming other techniques in most scenarios, in addition to providing a more informative and convenient probabilistic model.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108246"},"PeriodicalIF":1.6000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947325001227","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

A new approach to modeling continuous distributions in hybrid Bayesian networks (BNs) is presented. It is based on Mixtures of Polynomials (MoPs) with tails, named as tMoPs. This proposal is a variation of the usual MoP model, now including tails and several other improvements in the learning process. The adequate modeling of tails in variable distributions is relevant theoretically and for many reals applications, in which rare phenomena may have a great impact. The proposed approach has been designed to exploit the flexibility of the tMoP model to fit different continuous data distributions. This is especially relevant in those distributions with zones of density close to zero, in which polynomial fitting may be difficult. In these situations, tMoPs allow a polynomial fit in parts with higher density and the use of tails in areas with lower density. This permits a better global fit, without loss of overall accuracy and yielding a relatively simple density function. Learning algorithms for tMoPs conditional probability distributions with up to two parents of any type are developed. These tMoPs may be integrated into hybrid Bayesian networks to represent conditional probability distributions, thus allowing to perform probabilistic reasoning, such as causal inference, sensitivity analysis, and other decision-making operations. The suitability of tMoPs is evaluated in several ways, using a large set of real datasets with data of different natures. The experiments include: the analysis of goodness-of-fit with several continuous and pseudo-continuous variables, the optimization of certain parameters and the effect of variable selection and graph structure when using tMoPs in BNs, and finally the evaluation of the predictive ability of hybrid BNs based on tMoPs in classification and regression. Results show the good behavior of our proposal, with the tMoP hybrid Bayesian networks being equally accurate or outperforming other techniques in most scenarios, in addition to providing a more informative and convenient probabilistic model.
用带尾多项式的混合建模混合贝叶斯网络中的连续分布
提出了一种新的混合贝叶斯网络连续分布建模方法。它基于带有尾部的多项式混合(MoPs),称为tops。这个建议是通常的MoP模型的一个变体,现在在学习过程中包括了尾巴和其他几个改进。对变量分布中尾的适当建模在理论上和许多实际应用中都是相关的,在这些应用中,罕见的现象可能会产生很大的影响。所提出的方法旨在利用tMoP模型的灵活性来拟合不同的连续数据分布。这在那些密度区域接近于零的分布中尤其重要,在这些分布中多项式拟合可能很困难。在这些情况下,tops允许在密度较高的部分使用多项式拟合,并在密度较低的区域使用尾部。这允许更好的全局拟合,而不会损失整体精度,并产生相对简单的密度函数。开发了具有最多两个任意类型父节点的tops条件概率分布的学习算法。这些tops可以集成到混合贝叶斯网络中,以表示条件概率分布,从而允许执行概率推理,如因果推理、灵敏度分析和其他决策操作。通过使用大量具有不同性质数据的真实数据集,从几个方面评估了tops的适用性。实验包括:分析几个连续变量和伪连续变量的拟合优度,在bp网络中使用tMoPs对某些参数的优化以及变量选择和图结构的影响,最后评估基于tMoPs的混合bp网络在分类和回归方面的预测能力。结果显示了我们的建议的良好行为,除了提供更多信息和方便的概率模型外,tMoP混合贝叶斯网络在大多数情况下同样准确或优于其他技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computational Statistics & Data Analysis
Computational Statistics & Data Analysis 数学-计算机:跨学科应用
CiteScore
3.70
自引率
5.60%
发文量
167
审稿时长
60 days
期刊介绍: Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas: I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article. II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures. [...] III) Special Applications - [...] IV) Annals of Statistical Data Science [...]
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信