{"title":"Information-theoretic and Bayesian model selection for physics-based modeling: Balancing fit, complexity, and generalization","authors":"Xinyue Xu , Julian Wang","doi":"10.1016/j.ins.2025.122743","DOIUrl":"10.1016/j.ins.2025.122743","url":null,"abstract":"<div><div>Reliable model selection is a cornerstone of developing physics-based models of engineering systems. However, existing model selection criteria has not been investigated across a variety of calibration scenarios, where selection choices can be affected by (i) parameter dimensionality, (ii) model form, (iii) prior informativeness, (iv) reparameterization, and (v) data characteristics. Moreover, it remains unclear whether these criteria can reliably distinguish model fidelity that genuinely improves explanatory power. These limitations restrict the broader applicability of model selection criteria in physics-based modeling, where balancing goodness-of-fit, complexity, and generalization is critical. To address these gaps, this study systematically evaluates information-theoretic and Bayesian model selection criteria through two case studies. The first case study employs polynomial regression models to isolate the effects of calibration factors and investigate their influence on the selection behavior of criteria. The second case study extends the analysis to a hierarchy of thermal models for double-pane windows, examining the ability of selection criteria to differentiate effective complexity from superficial increases in model fidelity. Results indicate that classical information-theoretic criteria are sensitive to parameter dimensionality, while covariance-based criteria reflect changes in model form and data characteristics, and Bayesian criteria exhibit sensitivity to all examined calibration factors. Furthermore, both covariance-based and Bayesian criteria effectively identify secondary physical mechanisms as sources of ineffective complexity, penalizing redundant fidelity. These findings underscore that model selection is not a one-size-fits-all task, and the choice of model selection criteria should be informed by the calibration scenario and the modeling objective.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"726 ","pages":"Article 122743"},"PeriodicalIF":6.8,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145323312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Behavioral pattern clustering for thematic user segmentation in web interaction environments","authors":"Suma Srinath, Nagaraju Baydeti","doi":"10.1016/j.ins.2025.122745","DOIUrl":"10.1016/j.ins.2025.122745","url":null,"abstract":"<div><div>Clustering users based on their interest is a critical component in personalized content delivery. This paper proposes a novel multi modal framework that integrates semantic video classification, contextualized caption generation, and user behavior patterns. The system combines visual and audio features which are computed using convolutional and transformer based encoders to robustly capture the complex contents of video description. User browsing profile is modelled using probabilistic distributions to reflect realistic browsing behavior across six interest categories. These profiles are then clustered using KMeans, DBSCAN, and Agglomerative clustering to identify the various user groups. The quality of clustering is evaluated using Silhouttee Score, Davies-Bouldin Index, and Calinski-Harabasz Index, with PCA and t-SNE applied for visual validation of coherence of clusters. The simulation framework addresses the issues concerning data privacy and the scarcity of real world data by producing controllable and realistic user behavior traces. Experimental results demonstrate that KMeans provides the optimal trade-off between quality of clustering solution and computational cost. These integrated efforts bring personalized content delivery to a new perspective, i.e., fine-grained user segmentation and precise video understanding, respectively. The future work will focus on adopting real-time adaptive learning and integrating with more data types, and will further deploy on large-scale multimedia applications.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"724 ","pages":"Article 122745"},"PeriodicalIF":6.8,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145223015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust deep network learning of nonlinear regression tasks by parametric leaky exponential linear units (LELUs) and a diffusion metric","authors":"Enda D.V. Bigarella","doi":"10.1016/j.ins.2025.122739","DOIUrl":"10.1016/j.ins.2025.122739","url":null,"abstract":"<div><div>This document proposes a parametric activation function (<em>ac.f</em>) aimed at improving multidimensional nonlinear data regression. It is an established knowledge that nonlinear <em>ac.f</em>s are required for learning nonlinear datasets. This work shows that smoothness and gradient properties of the <em>ac.f</em> further impact the performance of large neural networks in terms of overfitting and sensitivity to model parameters. Smooth but vanishing-gradient <em>ac.f</em>s such as ELU or SiLU (Swish) have limited performance, and non-smooth <em>ac.f</em>s such as RELU and Leaky-RELU further impart discontinuity in the trained model. Improved performance is demonstrated with a smooth “Leaky Exponential Linear Unit”, with non-zero gradient that can be trained. A novel diffusion-loss metric is also proposed to gauge the performance of the trained models in terms of overfitting.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"725 ","pages":"Article 122739"},"PeriodicalIF":6.8,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhichao Zhao , Shangwei Guo , Jialing He , Yafei Li , Run Wang , Tao Xiang
{"title":"HP2: Hybrid and precision-guided filter pruning for CNN compression","authors":"Zhichao Zhao , Shangwei Guo , Jialing He , Yafei Li , Run Wang , Tao Xiang","doi":"10.1016/j.ins.2025.122741","DOIUrl":"10.1016/j.ins.2025.122741","url":null,"abstract":"<div><div>Filter pruning has emerged as a promising approach for compressing Convolutional Neural Network (CNN) models. However, existing methods often lack accuracy in evaluating filter importance and precision in on-demand filter pruning. In this paper, we address these limitations by proposing a novel Hybrid and Precision-guided filter Pruning method (HP<sup>2</sup>) for CNN compression, driven by two key observations. In particular, our method enhances filter importance evaluation and enables targeted filter pruning, allowing flexible reduction of computational complexity (FLOPs) or memory (parameters). We introduce the Hybrid Importance Score (HIS) to assess precise filter importance by leveraging both filter weights and activations. Moreover, we quantitatively analyze the intricate relationship between FLOPs and parameters, leading to an on-demand pruning strategy that further optimizes FLOPs or parameter reduction. Extensive experiments showcase the superiority of HP<sup>2</sup> over state-of-the-art CNN compression methods, particularly under high pruning ratios.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"725 ","pages":"Article 122741"},"PeriodicalIF":6.8,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The envelope of one-dimensional discrete-time quantum walk","authors":"Yunguo Lin , Shuiying Cai","doi":"10.1016/j.ins.2025.122722","DOIUrl":"10.1016/j.ins.2025.122722","url":null,"abstract":"<div><div>A mathematical model is presented for a one-dimensional discrete-time quantum walk, which is initiated from a quantum initial state and governed by a coin operator. When the coin operator is a flip operator, a path analysis formula is employed to compute the position probability distribution. For a general coin operator, matrix decomposition is utilized to transform it into the equivalent flip operator. When a walker undergoes <span><math><mi>n</mi></math></span> steps of evolution, it is observed that the probability of the walker occupying any given position exhibits the existence of both maximum and minimum values, irrespective of the quantum initial state. By linking these extreme positions together, a confined region is delineated, the boundary of which is designated as the envelope of the quantum walk. Remarkably, the envelope is independent of the quantum initial state. To facilitate the computation of this envelope, the relevant formulas are transformed into rational expressions, wherein both the numerators and denominators are represented by polynomials with even integer coefficients. These polynomials are classified to determine the coefficients of the numerator polynomials. An analysis is conducted to identify the location of the maximum value of the envelope, thereby examining the maximum value of the position probability distribution.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"726 ","pages":"Article 122722"},"PeriodicalIF":6.8,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145247885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoyu Jiang , Xiaoliang Chen , Duoqian Miao , Hongyun Zhang , Xiaolin Qin , Shangyi Du , Peng Lu
{"title":"3WD-DRT: A three-way decision enhanced dynamic routing transformer for cost-sensitive multimodal sentiment analysis","authors":"Haoyu Jiang , Xiaoliang Chen , Duoqian Miao , Hongyun Zhang , Xiaolin Qin , Shangyi Du , Peng Lu","doi":"10.1016/j.ins.2025.122704","DOIUrl":"10.1016/j.ins.2025.122704","url":null,"abstract":"<div><div>Accurately interpreting human emotion from language, facial expressions, and vocal tones remains a fundamental challenge in artificial intelligence. Current Multimodal Sentiment Analysis (MSA) models often struggle with two key issues. First, their static fusion strategies fail to handle conflicting modalities, such as sarcasm. Second, their standard loss functions ignore the asymmetric risks of severe misjudgments. To address these limitations, we propose the Three-Way Decision Enhanced Dynamic Routing Transformer (3WD-DRT), a framework operating on a \"quality-aware, decision-driven\" principle. It dynamically assesses each modality’s quality using a three-way decision gate, implemented via a dedicated MLP, to partition information into acceptance, deferment, or rejection pathways. This enables the model to amplify informative signals, moderately scale uncertain ones (deferment), and attenuate noisy or misleading ones. We also introduce a novel cost-sensitive loss function that imposes greater penalties on major semantic errors, such as polarity misclassifications. This approach better aligns the model’s training objective with human perception. Extensive experiments on CH-SIMS, CH-SIMSv2, MOSI, and MOSEI datasets show that 3WD-DRT consistently outperforms state-of-the-art methods, setting new benchmarks with F1-scores of 87.08 % on MOSI and 88.26 % on MOSEI. This work provides a robust solution for MSA, fostering more nuanced and reliable emotionally-aware AI systems.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"725 ","pages":"Article 122704"},"PeriodicalIF":6.8,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information suppression in large language models: Auditing, quantifying, and characterizing censorship in DeepSeek","authors":"Peiran Qiu , Siyi Zhou , Emilio Ferrara","doi":"10.1016/j.ins.2025.122702","DOIUrl":"10.1016/j.ins.2025.122702","url":null,"abstract":"<div><div>This study examines information suppression mechanisms in DeepSeek, an open-source large language model (LLM) developed in China. We propose an auditing framework to evaluate the censorship in the model through analyzing the response alignment with the corresponding chain of thought (CoT). By comparing model responses to 646 politically sensitive topics with those to non-politically sensitive topics, our audit unveils evidence of semantic-level information suppression in DeepSeek: sensitive content often appears within the model’s internal reasoning but is omitted or rephrased in the final output. Specifically, DeepSeek suppresses references to transparency, government accountability, and civic mobilization, while occasionally amplifying language aligned with state propaganda. This study underscores the need for systematic auditing of alignment, content moderation, information suppression, and censorship practices implemented into widely-adopted AI models, to ensure transparency, accountability, and equitable access to unbiased information obtained by means of these systems.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"724 ","pages":"Article 122702"},"PeriodicalIF":6.8,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FedPure: Data poisoning attack detection and purification for federated skeleton-based action recognition","authors":"Min Hyuk Kim , Eun-Gi Lee , Seok Bong Yoo","doi":"10.1016/j.ins.2025.122733","DOIUrl":"10.1016/j.ins.2025.122733","url":null,"abstract":"<div><div>Skeleton-based action recognition (SAR) often requires centralized skeleton data, raising serious privacy concerns in deployment scenarios such as healthcare or surveillance. Federated learning (FL) allows SAR models to be trained without sharing raw data and has therefore become an attractive approach for privacy-sensitive, distributed applications such as camera-enabled devices, human–robot interaction, and security monitoring. However, FL-based SAR remains vulnerable to data poisoning attacks. We propose a data poisoning attack detection and purification method for federated SAR, called FedPure. FedPure introduces a fused transform prototype representation, which combines global perspective transforms with subregion transforms to capture spatiotemporal cues. This design enables precise inter-client correlation analysis for malicious client detection. Moreover, a detector comprising inter-client spatiotemporal matching is designed to analyze the correlation between pseudo skeleton data. Furthermore, FedPure improves model robustness by purifying the malicious clients using a disentangled feature-based purifier to maintain data diversity. The experimental results on diverse adversarial attacks, including FGSM, PGD, C&W, Bone Length Attack, and Hard No Box Attack, confirm that FedPure outperforms existing models in SAR accuracy. By providing an integrated detection-and-purification pipeline tailored to federated SAR, FedPure narrows a key gap in privacy-preserving training, enabling safer application of FL-based action recognition. Our code is publicly available at <span><span>https://github.com/alsgur0720/fedpure</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"725 ","pages":"Article 122733"},"PeriodicalIF":6.8,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing usability in face privacy protection via vision-language guided diffusion model","authors":"Zhifeng Xu, Peiyao Yuan, Yiru Zhao, Lei Zhao","doi":"10.1016/j.ins.2025.122736","DOIUrl":"10.1016/j.ins.2025.122736","url":null,"abstract":"<div><div>With the development of the Internet, a large number of images containing faces are widely shared on social media, leading to increased risks of face-based identity tracking and privacy breaches. Face de-identification serves as a privacy protection technique that conceals identifiable personal information in images. Recent advancements in generative model-based face de-identification methods have made progress in ensuring privacy while preserving image usability. However, challenges remain in enhancing the usability. Specifically, current methods often generate images with noticeable artifacts or struggle to preserve the original semantic information, which can hinder the practical applications in various computer vision tasks. In this paper, we propose a vision-language understanding-guided diffusion model for face de-identification. Our method incorporates a semantic preservation module and an identity protection module to guide the diffusion model in generating de-identified images. The semantic preservation module leverages a vision-language model to retain the sentence-level semantic information of the original image. The identity protection module perturbs the identity representation to ensure privacy. We train and evaluate our method on different datasets, and the experimental results demonstrate that, while ensuring privacy protection, our method not only surpasses existing methods in image quality but also outperforms them across multiple fine-grained utility tasks.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"725 ","pages":"Article 122736"},"PeriodicalIF":6.8,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proactive mission-time-efficient coverage path planning using hierarchical heuristics","authors":"Junghwan Gong, Moses O. Oluma, Seunghwan Lee","doi":"10.1016/j.ins.2025.122696","DOIUrl":"10.1016/j.ins.2025.122696","url":null,"abstract":"<div><div>Ensuring efficient and reliable autonomous coverage in large-scale environments remains a persistent challenge, particularly owing to the battery limitations of robotic systems. To address this challenge, this study proposes a novel, proactive energy-aware coverage path planning (CPP) framework that considers traveling and charging durations in a unified manner. The proposed method explicitly models realistic battery dynamics, including nonlinear charging and discharging behaviors. To render the problem practically solvable, it is decomposed into a hierarchical two-stage structure. Each stage is addressed using a well-suited heuristic: Ant Colony Optimization (ACO) for generating coverage paths, and a Genetic Algorithm (GA) for scheduling recharging actions. In contrast to conventional reactive approaches that respond only after the battery level becomes critical, the proposed method schedules recharging actions in advance, aiming to reduce the overall mission time proactively and strategically. Extensive simulations in synthetic, real-world-acquired, and real-world-based obstacle-rich coverage environments validate the effectiveness of the proposed method. The results demonstrate a mission time reduction of up to 24.66 %, with consistent improvements in energy reliability across varying charging station densities. These findings highlight the practicality of the proposed method as a global scheduler for real-world deployment in energy-constrained environments. Furthermore, this framework lays the foundation for extensions to multi-robot systems, enabling scalable, adaptive, and mission-time-efficient coordination in large-scale autonomous missions.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"725 ","pages":"Article 122696"},"PeriodicalIF":6.8,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145227738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}