Discovering optimal Markov blanket for high-dimensional streaming features

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-04-29 DOI:10.1016/j.ins.2025.122240

Waqar Khan , Brekhna Brekhna , Yajun Xie , Muhammad Sadiq Hassan Zada , Rasool Shah , Yifan Zheng

{"title":"Discovering optimal Markov blanket for high-dimensional streaming features","authors":"Waqar Khan , Brekhna Brekhna , Yajun Xie , Muhammad Sadiq Hassan Zada , Rasool Shah , Yifan Zheng","doi":"10.1016/j.ins.2025.122240","DOIUrl":null,"url":null,"abstract":"<div><div>Conducting knowledge discovery on high-dimensional streaming features requires an online causal feature selection process that can significantly reduce the complexity of real-world feature spaces and enhance the learning process. This is achieved by mining online causal features to construct a Markov blanket (MB) for the class label, select highly relevant subsets, and minimize the numbers of irrelevant and redundant features within contained the streaming feature space. However, the prevailing MB algorithms (e.g., offline and online methods) often fall short in terms of discerning the causal relationship between a class label and the selected features, rendering them ineffective and inefficient for addressing high-dimensional streaming feature spaces. We propose a novel algorithm named <u>D</u>iscovering <u>O</u>ptimal - <u>M</u>arkov <u>b</u>lanket for high-dimensional <u>S</u>treaming <u>F</u>eatures (DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span>) to address these limitations, and this approach is tailored to optimally learn an MB online. First, DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span> dynamically learns the parents (Ps), children (Cs), and spouses of class labels, thereby distinguishing PC relationships from spouses and Ps from Cs during the MB learning procedure. Second, learning relevant PC and spousal relationships and accurately distinguishing them enables a balance to be struck between prediction accuracy and computational efficiency, ensuring a comprehensive online causal feature selection approach. An extensive experimental validation highlights the superiority of the DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span> algorithm in terms of accuracy and efficiency. By identifying powerfully relevant PC and spousal relationships and optimizing the tradeoff between accuracy and efficiency, DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span> is a promising solution for performing online causal feature selection in high-dimensional streaming feature spaces. The code has been released on <span><span>https://github.com/vickykhan89/DO-MBSF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"716 ","pages":"Article 122240"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S002002552500372X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Conducting knowledge discovery on high-dimensional streaming features requires an online causal feature selection process that can significantly reduce the complexity of real-world feature spaces and enhance the learning process. This is achieved by mining online causal features to construct a Markov blanket (MB) for the class label, select highly relevant subsets, and minimize the numbers of irrelevant and redundant features within contained the streaming feature space. However, the prevailing MB algorithms (e.g., offline and online methods) often fall short in terms of discerning the causal relationship between a class label and the selected features, rendering them ineffective and inefficient for addressing high-dimensional streaming feature spaces. We propose a novel algorithm named Discovering Optimal - Markov blanket for high-dimensional Streaming Features (DO-MB

_{S F}

) to address these limitations, and this approach is tailored to optimally learn an MB online. First, DO-MB

_{S F}

dynamically learns the parents (Ps), children (Cs), and spouses of class labels, thereby distinguishing PC relationships from spouses and Ps from Cs during the MB learning procedure. Second, learning relevant PC and spousal relationships and accurately distinguishing them enables a balance to be struck between prediction accuracy and computational efficiency, ensuring a comprehensive online causal feature selection approach. An extensive experimental validation highlights the superiority of the DO-MB

_{S F}

algorithm in terms of accuracy and efficiency. By identifying powerfully relevant PC and spousal relationships and optimizing the tradeoff between accuracy and efficiency, DO-MB

_{S F}

is a promising solution for performing online causal feature selection in high-dimensional streaming feature spaces. The code has been released on https://github.com/vickykhan89/DO-MBSF.

查看原文本刊更多论文

发现高维流特征的最优马尔可夫毯

在高维流特征上进行知识发现需要一个在线的因果特征选择过程，这可以显著降低现实世界特征空间的复杂性，并增强学习过程。这是通过挖掘在线因果特征来为类标签构建马尔可夫毯子（MB），选择高度相关的子集，并最小化包含流特征空间中的不相关和冗余特征的数量来实现的。然而，主流的MB算法（例如，离线和在线方法）在识别类标签和所选特征之间的因果关系方面往往不足，使得它们在处理高维流特征空间时效率低下。我们提出了一种新的算法，称为发现高维流特征的最优马尔可夫毯子（DO-MBSF）来解决这些限制，并且这种方法是量身定制的，可以最优地在线学习MB。首先，DO-MBSF动态地学习班级标签的父母（Ps）、子女（Cs）和配偶，从而在MB学习过程中区分PC关系和配偶关系以及Ps和Cs。其次，学习相关的PC和配偶关系并准确区分它们，可以在预测精度和计算效率之间取得平衡，确保全面的在线因果特征选择方法。大量的实验验证表明了DO-MBSF算法在精度和效率方面的优越性。通过识别强相关的PC和配偶关系，优化精度和效率之间的权衡，DO-MBSF是在高维流特征空间中执行在线因果特征选择的一个有前途的解决方案。该代码已在https://github.com/vickykhan89/DO-MBSF上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.