Waqar Khan , Brekhna Brekhna , Yajun Xie , Muhammad Sadiq Hassan Zada , Rasool Shah , Yifan Zheng
{"title":"Discovering optimal Markov blanket for high-dimensional streaming features","authors":"Waqar Khan , Brekhna Brekhna , Yajun Xie , Muhammad Sadiq Hassan Zada , Rasool Shah , Yifan Zheng","doi":"10.1016/j.ins.2025.122240","DOIUrl":"10.1016/j.ins.2025.122240","url":null,"abstract":"<div><div>Conducting knowledge discovery on high-dimensional streaming features requires an online causal feature selection process that can significantly reduce the complexity of real-world feature spaces and enhance the learning process. This is achieved by mining online causal features to construct a Markov blanket (MB) for the class label, select highly relevant subsets, and minimize the numbers of irrelevant and redundant features within contained the streaming feature space. However, the prevailing MB algorithms (e.g., offline and online methods) often fall short in terms of discerning the causal relationship between a class label and the selected features, rendering them ineffective and inefficient for addressing high-dimensional streaming feature spaces. We propose a novel algorithm named <u>D</u>iscovering <u>O</u>ptimal - <u>M</u>arkov <u>b</u>lanket for high-dimensional <u>S</u>treaming <u>F</u>eatures (DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span>) to address these limitations, and this approach is tailored to optimally learn an MB online. First, DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span> dynamically learns the parents (Ps), children (Cs), and spouses of class labels, thereby distinguishing PC relationships from spouses and Ps from Cs during the MB learning procedure. Second, learning relevant PC and spousal relationships and accurately distinguishing them enables a balance to be struck between prediction accuracy and computational efficiency, ensuring a comprehensive online causal feature selection approach. An extensive experimental validation highlights the superiority of the DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span> algorithm in terms of accuracy and efficiency. By identifying powerfully relevant PC and spousal relationships and optimizing the tradeoff between accuracy and efficiency, DO-MB<span><math><msub><mrow></mrow><mrow><mi>S</mi><mi>F</mi></mrow></msub></math></span> is a promising solution for performing online causal feature selection in high-dimensional streaming feature spaces. The code has been released on <span><span>https://github.com/vickykhan89/DO-MBSF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"716 ","pages":"Article 122240"},"PeriodicalIF":8.1,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143894990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seungwan Park , Taewoong Ryu , Doyoon Kim , Doyoung Kim , Hanju Kim , Myungha Cho , Unil Yun
{"title":"Sliding window-based high utility occupancy pattern mining for data streams","authors":"Seungwan Park , Taewoong Ryu , Doyoon Kim , Doyoung Kim , Hanju Kim , Myungha Cho , Unil Yun","doi":"10.1016/j.ins.2025.122243","DOIUrl":"10.1016/j.ins.2025.122243","url":null,"abstract":"<div><div>High utility-based pattern mining has been proposed to analyze information by considering not only the frequency of items but also their quantity and profit. Among these, studies on high utility occupancy-based patterns have emerged, which consider the occupancy measure reflecting the share of a pattern belonging to transactions. Furthermore, as the necessity to process real-time stream data has become more critical, a method to discover high utility occupancy-based patterns in stream information has been presented recently. However, this recent method handles all accumulated data on data stream environments. Since all previously accumulated data are processed, the volume of data to be processed steadily increases over time, leading to a decline in efficiency over time. In addition, it becomes difficult to give emphasis on recent data. Consequently, these methods become less suitable for practical applications. To surmount the drawbacks, we introduce a novel approach for mining high utility occupancy patterns, employing a sliding window technique to efficiently process stream data. By focusing on fixed-size, most recent data within the window, our method effectively reflects the trends in the latest data while exhibiting improved efficiency compared to previous approaches. Extensive performance evaluations demonstrate the efficacy of the proposed method against prior methods regarding runtime, memory usage, scalability, and sensitivity. Moreover, statistical tests confirm that our approach accurately extracts the exact number of patterns without pattern loss or duplication.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"716 ","pages":"Article 122243"},"PeriodicalIF":8.1,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probability completion and consensus reaching based on kernel density estimation for incomplete probabilistic linguistic multi-attribute group decision making","authors":"Jinglin Xiao, Xinxin Wang, Ying Gao, Zeshui Xu","doi":"10.1016/j.ins.2025.122207","DOIUrl":"10.1016/j.ins.2025.122207","url":null,"abstract":"<div><div>Multi-attribute group decision-making is a hot topic in the study of uncertain decision-making processes, particularly when linguistic variables are employed to express evaluative information. However, incomplete information often arises due to cognitive disparities among decision-makers and their diverse evaluation preferences. To address these challenges, this paper proposes a novel multi-attribute group decision-making method that incorporates incomplete probabilistic linguistic term sets and considers nonlinear semantics. First, we introduce an innovative application of kernel density estimation to complete incomplete term sets, employing Gaussian kernel functions to model the nonlinear perceptual variations of decision-makers. The bandwidth and skewness parameters are utilized to reflect perceptual granularity and evaluation bias, respectively. Second, we modify the Kolmogorov-Smirnov distance measure and propose a novel comparison rule tailored to probabilistic linguistic term sets with semantic imbalance, enhancing the computational accuracy of attribute weight determination. Furthermore, two optimization models are developed to determine the bandwidths for completing incomplete information and aggregating individual evaluations. A dynamic adjustment mechanism is introduced to support decision-maker interaction in achieving consensus. The effectiveness of the proposed methods is demonstrated through a case study on gas meter selection. Sensitivity analysis and comparative experiments highlight its superior performance in handling incomplete information and managing uneven semantics.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122207"},"PeriodicalIF":8.1,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lan Huang , Shuyu Guo , Tian Bai , Ruihong Zhao , Ke Tao
{"title":"Prompt-guided orthogonal multimodal fusion for cancer survival prediction","authors":"Lan Huang , Shuyu Guo , Tian Bai , Ruihong Zhao , Ke Tao","doi":"10.1016/j.ins.2025.122242","DOIUrl":"10.1016/j.ins.2025.122242","url":null,"abstract":"<div><div>Cancer survival prediction can assist clinicians in developing personalized treatment plans for patients. Comprehensive cancer diagnosis and treatment require integrating macroscopic and microscopic imaging. However, significant discrepancies in the spatial resolution and anatomical scale between imaging modalities hinder existing multimodal fusion methods from effectively learning correlated semantic features with limited datasets. In this work, we introduce a prompt-guided orthogonal multimodal fusion strategy (POMF) for fusing multimodal medical images across anatomical scales. POMF utilizes modality-specific prompts to fine-tune pretrained models, facilitating bias adaptation to medical imaging features while ensuring more computationally efficient training. A modality consistency-discrepancy prototype is designed as the modality-inherent prompt in POMF, disentangling the multimodal features and bridging the potential correlations across the orthogonal modalities. POMF is validated on a glioma survival prediction task using paired radiology and pathology images. The experiment results suggest that POMF achieves superior C-index with existing full-tuning and prompt-tuning methods. Additionally, the ablation studies demonstrate that POMF is adaptable to various architectures of pretrained encoders and multiple multimodal fusion strategies on cross-scale medical images.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122242"},"PeriodicalIF":8.1,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Muhammad Ahmed Hassan Shah , Atif Rizwan , Muhammad Sardaraz , Muhammad Tahir , Nagwan Abdel Samee , Mona M. Jamjoom
{"title":"Multimodal cross-domain contrastive learning: A self-supervised generative and geometric framework for visual perception","authors":"S. Muhammad Ahmed Hassan Shah , Atif Rizwan , Muhammad Sardaraz , Muhammad Tahir , Nagwan Abdel Samee , Mona M. Jamjoom","doi":"10.1016/j.ins.2025.122239","DOIUrl":"10.1016/j.ins.2025.122239","url":null,"abstract":"<div><div>Self-Supervised Contrastive Representation Learning (SSCRL) has gained significant attention for its ability to learn meaningful representations from unlabeled data by leveraging contrastive learning principles. However, existing SSCRL approaches struggle with effectively handling heterogeneous data formats, particularly discrete and binary representations, limiting adaptability across multiple domains. This limitation hinders the generalization of learned representations, especially in applications requiring structured feature encoding and robust cross-domain adaptability. To address this, we propose the Modular QCB Learner, a novel algorithm designed to enhance representation learning for heterogeneous data types. This framework builds upon SSCRL by incorporating a Real Non-Volume Preserving transformation to optimize continuous representations, ensuring alignment with a Gaussian distribution. For discrete representation learning, vector quantization is utilized along with a Poisson distribution, while binary representations are modeled through nonlinear transformations and the Bernoulli distribution. Multi-Domain Mixture Optimization (MiDO) is introduced to facilitate joint optimization of different representation types by integrating multiple loss functions. To evaluate effectiveness, synthetic data generation is performed on extracted representations and compared with baselines. Experiments on CIFAR-10 confirm the Modular QCB Learner improves representation quality, demonstrating robustness across diverse data domains with applications in synthetic data generation, anomaly detection and multimodal learning.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122239"},"PeriodicalIF":8.1,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mario Chahoud , Hani Sami , Rabeb Mizouni , Jamal Bentahar , Azzam Mourad , Hadi Otrok , Chamseddine Talhi
{"title":"Reward shaping in DRL: A novel framework for adaptive resource management in dynamic environments","authors":"Mario Chahoud , Hani Sami , Rabeb Mizouni , Jamal Bentahar , Azzam Mourad , Hadi Otrok , Chamseddine Talhi","doi":"10.1016/j.ins.2025.122238","DOIUrl":"10.1016/j.ins.2025.122238","url":null,"abstract":"<div><div>In edge computing environments, efficient computation resource management is crucial for optimizing service allocation to hosts in the form of containers. These environments experience dynamic user demands and high mobility, making traditional static and heuristic-based methods inadequate for handling such complexity and variability. Deep Reinforcement Learning (DRL) offers a more adaptable solution, capable of responding to these dynamic conditions. However, existing DRL methods face challenges such as high reward variability, slow convergence, and difficulties in incorporating user mobility and rapidly changing environmental configurations. To overcome these challenges, we propose a novel DRL framework for computation resource optimization at the edge layer. This framework leverages a customized Markov Decision Process (MDP) and Proximal Policy Optimization (PPO), integrating a Graph Convolutional Transformer (GCT). By combining Graph Convolutional Networks (GCN) with Transformer encoders, the GCT introduces a spatio-temporal reward-shaping mechanism that enhances the agent's ability to select hosts and assign services efficiently in real time while minimizing the overload. Our approach significantly enhances the speed and accuracy of resource allocation, achieving, on average across two datasets, a 30% reduction in convergence time, a 25% increase in total accumulated rewards, and a 35% improvement in service allocation efficiency compared to standard DRL methods and existing reward-shaping techniques. Our method was validated using two real-world datasets, MOBILE DATA CHALLENGE (MDC) and Shanghai Telecom, and was compared against standard DRL models, reward-shaping baselines, and heuristic methods.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122238"},"PeriodicalIF":8.1,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunlong Gao , Qinting Wu , Zhenghong Xu , Jinyan Pan , Guifang Shao , Qingyuan Zhu , Feiping Nie
{"title":"Diversity-induced fuzzy clustering with Laplacian regularization","authors":"Yunlong Gao , Qinting Wu , Zhenghong Xu , Jinyan Pan , Guifang Shao , Qingyuan Zhu , Feiping Nie","doi":"10.1016/j.ins.2025.122225","DOIUrl":"10.1016/j.ins.2025.122225","url":null,"abstract":"<div><div>Fuzzy clustering is a fundamental technique in unsupervised learning for exploring data structures. However, fuzzy c-means (FCM), as a representative fuzzy clustering algorithm, performs relatively poorly when handling noisy data and outliers since it only considers global data characteristics while ignoring the local information. Additionally, FCM overlooks data diversity, making it difficult to handle complex data and leading to cluster center overlapping. To address these challenges, this paper proposes a novel approach called diversity-induced fuzzy clustering with Laplacian regularization (DiFCMLR). DiFCMLR incorporates Hilbert-Schmidt Independence Criterion (HSIC) to maximize the independence among clusters, thereby enhancing clustering diversity. In addition, DiFCMLR introduces Laplacian regularization to consider the local information of data and determine the affinity relationship between samples. Furthermore, it corrects the Euclidean distance between samples, thereby reducing the impact of the normal distribution prior assumption of FCM and improving the applicability of algorithm to complex data or size-imbalance problems. During the optimization, DiFCMLR utilizes iterative reweighting and the alternating direction method of multipliers, which enhance robustness against noise and outliers and achieve faster convergence towards better solutions. The effectiveness of DiFCMLR is confirmed through theoretical analysis and experimental evaluation.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122225"},"PeriodicalIF":8.1,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143879129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An analysis on the effects of evolving the Monte Carlo tree search upper confidence for trees selection policy on unimodal, multimodal and deceptive landscapes","authors":"Edgar Galván , Fred Valdez Ameneyro","doi":"10.1016/j.ins.2025.122226","DOIUrl":"10.1016/j.ins.2025.122226","url":null,"abstract":"<div><div>Monte Carlo Tree Search (MCTS) is a best-first sampling/planning method used to find optimal decisions. The effectiveness of MCTS depends on the construction of its statistical tree, with the selection policy playing a crucial role. A particularly effective selection policy in MCTS is the Upper Confidence Bounds for Trees (UCT). While MCTS/UCT generally performs well, there may be variants that outperform it, leading to efforts to evolve selection policies for use in MCTS. However, these efforts have often been limited in their ability to demonstrate when these evolved policies might be beneficial. They frequently rely on single, poorly understood problems or on new methods that are not fully comprehended. To address these limitations, we use three evolutionary-inspired methods: Evolutionary Algorithm (EA)-MCTS, Semantically-inspired EA (SIEA)-MCTS as well as Self-adaptive (SA)-MCTS, which evolve online selection policies to be used in place of UCT. We compare these three methods against five variants of the standard MCTS on ten test functions of varying complexity and nature, including unimodal, multimodal, and deceptive features. By using well-defined metrics, we demonstrate how the evolution of MCTS/UCT can yield benefits in multimodal and deceptive scenarios, while MCTS/UCT remains robust across all functions used in this work.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122226"},"PeriodicalIF":8.1,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143885963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geodesic fuzzy rough sets based on overlap functions and its applications in feature extraction","authors":"Chengxi Jian , Junsheng Qiao , Shan He","doi":"10.1016/j.ins.2025.122224","DOIUrl":"10.1016/j.ins.2025.122224","url":null,"abstract":"<div><div>As one of the current hot topics, feature extraction techniques have been widely studied, with the aim of selecting important and distinctive feature subsets from the original data to realize data dimensionality reduction. However, current feature extraction techniques lack the consideration of complex manifold structures in high-dimensional data, thus failing to fully exploit the information value of the data. To solve this problem, we introduce overlap functions (an emerging class of commonly used information aggregation functions with a wide range of applications) into the geodesic fuzzy rough set model and propose a new model named OKGFRS, which can effectively capture the potential manifold structures in high-dimensional data and deal with the imbalanced data. On this basis, we design a new discriminative feature extraction algorithm to improve the discriminative performance of feature extraction and to solve the problems such as poor distinguishing ability of features. After experimental verification, the algorithm demonstrates good classification performance.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122224"},"PeriodicalIF":8.1,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143873893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FDS: Fractal decomposition based direct search approach for continuous dynamic optimization","authors":"Arcadi Llanza , Nadiya Shvai , Amir Nakib","doi":"10.1016/j.ins.2025.122237","DOIUrl":"10.1016/j.ins.2025.122237","url":null,"abstract":"<div><div>Dynamic optimization problems (DOPs) are known to be challenging due to the variability of their objective functions and constraints over time. The complexity of these problems increases further when the frequency of landscape change and the dimensionality of the search space are large. In this work, we propose a novel fractal decomposition-based method designed for DOPs, called FDS. It is a new single solution metaheuristic that introduces a new hypersphere-based space decomposition for efficient exploration, an archive for diversity control, and a pseudo-gradient-based local search (called GraILS) for fast exploitation. Extensive experiments on the well-known and the standard benchmark (the Moving Peak Benchmark: MPB) demonstrate that FDS consistently outperforms state-of-the-art competitors. Furthermore, FDS shows high robustness across diverse scenarios, maintaining superior performance despite variations in key benchmark parameters, such as the severity of landscape shifts, the number of peaks, the dimensionality of the problem, and the frequency of change. FDS achieves the highest average rank across all experiments and demonstrates dominant performance in 19 out of 23 scenarios. The implementation of FDS is available via the following GitHub repository: <span><span>https://github.com/alc1218/FDS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122237"},"PeriodicalIF":8.1,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143885962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}