Waqar Khan , Brekhna Brekhna , Yajun Xie , Muhammad Sadiq Hassan Zada , Rasool Shah , Yifan Zheng
{"title":"Discovering optimal Markov blanket for high-dimensional streaming features","authors":"Waqar Khan , Brekhna Brekhna , Yajun Xie , Muhammad Sadiq Hassan Zada , Rasool Shah , Yifan Zheng","doi":"10.1016/j.ins.2025.122240","DOIUrl":null,"url":null,"abstract":"<div><div>Conducting knowledge discovery on high-dimensional streaming features requires an online causal feature selection process that can significantly reduce the complexity of real-world feature spaces and enhance the learning process. This is achieved by mining online causal features to construct a Markov blanket (MB) for the class label, select highly relevant subsets, and minimize the numbers of irrelevant and redundant features within contained the streaming feature space. However, the prevailing MB algorithms (e.g., offline and online methods) often fall short in terms of discerning the causal relationship between a class label and the selected features, rendering them ineffective and inefficient for addressing high-dimensional streaming feature spaces. We propose a novel algorithm named <u>D</u>iscovering <u>O</u>ptimal - <u>M</u>arkov <u>b</u>lanket for high-dimensional <u>S</u>treaming <u>F</u>eatures (DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span>) to address these limitations, and this approach is tailored to optimally learn an MB online. First, DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span> dynamically learns the parents (Ps), children (Cs), and spouses of class labels, thereby distinguishing PC relationships from spouses and Ps from Cs during the MB learning procedure. Second, learning relevant PC and spousal relationships and accurately distinguishing them enables a balance to be struck between prediction accuracy and computational efficiency, ensuring a comprehensive online causal feature selection approach. An extensive experimental validation highlights the superiority of the DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span> algorithm in terms of accuracy and efficiency. By identifying powerfully relevant PC and spousal relationships and optimizing the tradeoff between accuracy and efficiency, DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span> is a promising solution for performing online causal feature selection in high-dimensional streaming feature spaces. The code has been released on <span><span>https://github.com/vickykhan89/DO-MBSF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"716 ","pages":"Article 122240"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S002002552500372X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Conducting knowledge discovery on high-dimensional streaming features requires an online causal feature selection process that can significantly reduce the complexity of real-world feature spaces and enhance the learning process. This is achieved by mining online causal features to construct a Markov blanket (MB) for the class label, select highly relevant subsets, and minimize the numbers of irrelevant and redundant features within contained the streaming feature space. However, the prevailing MB algorithms (e.g., offline and online methods) often fall short in terms of discerning the causal relationship between a class label and the selected features, rendering them ineffective and inefficient for addressing high-dimensional streaming feature spaces. We propose a novel algorithm named Discovering Optimal - Markov blanket for high-dimensional Streaming Features (DO-MB) to address these limitations, and this approach is tailored to optimally learn an MB online. First, DO-MB dynamically learns the parents (Ps), children (Cs), and spouses of class labels, thereby distinguishing PC relationships from spouses and Ps from Cs during the MB learning procedure. Second, learning relevant PC and spousal relationships and accurately distinguishing them enables a balance to be struck between prediction accuracy and computational efficiency, ensuring a comprehensive online causal feature selection approach. An extensive experimental validation highlights the superiority of the DO-MB algorithm in terms of accuracy and efficiency. By identifying powerfully relevant PC and spousal relationships and optimizing the tradeoff between accuracy and efficiency, DO-MB is a promising solution for performing online causal feature selection in high-dimensional streaming feature spaces. The code has been released on https://github.com/vickykhan89/DO-MBSF.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.