Nasir Saleem , Sami Bourouis , Hela Elmannai , Abeer D. Algarni
{"title":"CTSE-Net: Resource-efficient convolutional and TF-transformer network for speech enhancement","authors":"Nasir Saleem , Sami Bourouis , Hela Elmannai , Abeer D. Algarni","doi":"10.1016/j.knosys.2025.113452","DOIUrl":"10.1016/j.knosys.2025.113452","url":null,"abstract":"<div><div>Deep Neural Networks (DNNs) are powerful tools in real-time speech enhancement (SE) since they automatically learn high-level feature representations from raw audio, resulting in significant advancements. Therefore, demand for resource-efficient DNNs for speech enhancement is increasing, mainly using embedded systems. Still, a lightweight and resource-efficient DNN with optimal speech enhancement performance is a challenging task. Dual-path attention-driven architectures have shown notable performance in SE, primarily because of their ability to capture time and frequency dependencies. This paper proposes a resource-efficient SE using a codec-based dual-path time–frequency transformer (CTSE-Net) to improve noisy speech and apply it to speech recognition tasks. The proposed SE employs a codec (coder–decoder) architecture with feature calibration in skip connections to obtain fine-grained frequency components. The codec is interconnected using a dual-path time–frequency transformer incorporating time and frequency attentions. The encoder encodes a time–frequency (T–F) representation derived from the distorted compressed speech spectrum, whereas the decoder estimates the compressed magnitude spectrum of enhanced speech. Further, dedicated speech activity detection (SAD) is employed to identify speech segments in the input signals. By distinguishing speech from background noise or silence, the SAD block provides important information to the decoder for target speech enhancement. The proposed resource-efficient approach ensures attention across time–frequency and distinguishes speech from background noise, leading to more effective denoising and enhancement. Experiments indicate that CTSE-Net shows robust noise reduction and contributes to accurate speech recognition. On the benchmark VCTK+DEMAND dataset, the proposed CTSE-Net demonstrates better SE performance, achieving notable improvements in ESTOI (33.69%), PESQ (1.05), and SDR (11.36 dB) over the noisy mixture.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113452"},"PeriodicalIF":7.2,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143820499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haifei Ma , Canlong Zhang , Enhao Ning , Chai Wen Chuah
{"title":"Temporal Motion and Spatial Enhanced Appearance with Transformer for video-based person ReID","authors":"Haifei Ma , Canlong Zhang , Enhao Ning , Chai Wen Chuah","doi":"10.1016/j.knosys.2025.113461","DOIUrl":"10.1016/j.knosys.2025.113461","url":null,"abstract":"<div><div>For video-based person Re-Identification (Re-ID), how to efficiently extract temporal motion features and spatial appearance features from video sequences is a key issue. Conventional approaches focus on modelling the entire video spatio-temporal features, ignoring the inherent differences between temporal motion features (e.g., gait) that change over time and spatial appearance features (e.g., clothing) that are stable over time in terms of attributes. Because of their different sensitivities in real-world scenarios, conventional approaches often lose critical fine-grained features. To address these issues, we propose a <strong>T</strong>emporal <strong>M</strong>otion and spatial <strong>E</strong>nhanced <strong>A</strong>ppearance with <strong>T</strong>ransformer-based (T<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>MEA) framework for modelling spatial–temporal video discriminative representations. Specifically, (1) Dual-Branch Architecture: The content branch emphasises extracting the overall structure of the video using the spatial–temporal aggregation (STA) module from a global view, whereas the fovea branch focuses on gaining local fine-grained spatio-temporal features. (2) Zero-Parameter Design: the [CLS] Token Channel Shift Interaction (TCSI) module captures the dynamic features and static features between adjacent frames without additional parameters; the Spatial Patches Shift Enhancing (SPSE) module is introduced to enhance appearance features within frame to address occlusion and illumination changes without additional parameters. (3) Spatial–Temporal Interaction: The Cross-Attention Aggregation (CAA) module is proposed to interact between temporal and spatial features and further enrich the spatial–temporal feature representation for video sequences. Extensive experiments on three public Re-ID benchmarks (MARS, iLIDS-VID, and PRID-2011) demonstrate that the proposed framework outperforms several state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113461"},"PeriodicalIF":7.2,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143820500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Tubishat , Dina Tbaishat , Ala’ M. Al-Zoubi , Abed-Elalim Hraiz , Maria Habib
{"title":"Leveraging evolutionary algorithms with a dynamic weighted search space approach for fraud detection in healthcare insurance claims","authors":"Mohammad Tubishat , Dina Tbaishat , Ala’ M. Al-Zoubi , Abed-Elalim Hraiz , Maria Habib","doi":"10.1016/j.knosys.2025.113436","DOIUrl":"10.1016/j.knosys.2025.113436","url":null,"abstract":"<div><div>The healthcare industry has been suffering from fraud in many facets for decades, resulting in millions of dollars lost to fictitious claims at the expense of other patients who cannot afford appropriate care. As such, accurately identifying fraudulent claims is one of the most important factors in a well-functioning healthcare system. However, over time, fraud has become harder to detect because of increasingly complex and sophisticated fraud scheme development, data unpreparedness, as well as data privacy concerns. Moreover, traditional methods are proving increasingly inadequate in addressing this issue. To solve this issue a novel evolutionary dynamic weighted search space approach (DW-WOA-SVM) is presented in the current study. The approach has different levels that work simultaneously, where the optimization algorithm is responsible for tuning the Support Vector Machine (SVM) parameters, applying the weighting procedure for the features, and using a dynamic search space to adjust the range values. Tuning the parameters benefits the performance of SVM, and the weighting technique makes it updated with importance and lets the algorithm focus on data structure in addition to optimization objectives. The dynamic search space enhances the search range during the process. Furthermore, large language models have been applied to generate the dataset to improve the quality of the data and address the lack of good dimensionality, helping to enhance the richness of the data. The experiments highlighted the superior performance of this proposed approach than other algorithms.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113436"},"PeriodicalIF":7.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yutong Yang , Xiaofeng Wang , Xin Lin , Honggang Chen
{"title":"DEMNet: A degradation difference enabled multi-stage network for multiple degradation image restoration","authors":"Yutong Yang , Xiaofeng Wang , Xin Lin , Honggang Chen","doi":"10.1016/j.knosys.2025.113426","DOIUrl":"10.1016/j.knosys.2025.113426","url":null,"abstract":"<div><div>Restoring images affected by multiple types of degradation is a challenging and active area, requiring simultaneous handling of various forms of corruption and maintaining semantic context and spatial details while effectively managing network complexity. To advance the performance of multiple degradation image restoration tasks, we introduce a Degradation Difference Enabled Multi-stage Network (DEMNet), which can efficiently restore images affected by various types of degradation, including noise, rain, and haze. Three key contributions are provided by the proposed method: First, we introduce a unique three-stage architecture in DEMNet to balance capturing fine spatial details and the preservation of crucial contextual information. Second, we present a degradation difference enabled contrastive loss as a guidance mechanism for a degradation difference encoder. This loss function facilitates the accurate extraction of degradation-specific information, thus enhancing restoration. Finally, to balance channel features and spatial details, we propose a spatial-channel integration block, which improves the overall representation of the restored image with improved comprehensiveness and accuracy. By integrating these innovative modules, our DEMNet performs favorably against the latest approaches in multiple degradation image restoration while significantly reducing the number of parameters. Furthermore, DEMNet also exhibits excellent performance in single degradation image restoration, showcasing its versatility and effectiveness across various degradation types. Extensive experimental evaluations confirm the superiority of the proposed method.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"318 ","pages":"Article 113426"},"PeriodicalIF":7.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143835389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing IoT data collection through federated learning and periodic scheduling","authors":"Darya AzharShokoufeh , Nahideh DerakhshanFard , Fahimeh RashidJafari , Ali Ghaffari","doi":"10.1016/j.knosys.2025.113526","DOIUrl":"10.1016/j.knosys.2025.113526","url":null,"abstract":"<div><div>The Internet of Things (IoT) describes a system of interlinked devices, sensors, and intelligent systems that facilitate intricate management in smart homes, industries, and cities. The devices constantly gather basic information like temperature, humidity, geographical location, and energy consumption to facilitate analytics and decision-making. However, traditional data collection methods, such as direct information transfer to a central server, face significant challenges regarding bandwidth use, energy efficiency, data security, reliability, and overall performance. These methods require robust communication infrastructures, often leading to network resource overexploitation due to raw data transmission. Although edge computing, fog computing, fedHGL, and centralized learning methods are considered modern techniques offering some advantages, they still require complex infrastructures and have the same difficulties processing heterogeneous or big datasets. Periodic scheduling is a new paradigm for federated learning, where the data will be processed locally, and only the updated model weights will be transferred to the central server. This approach significantly reduces bandwidth and energy consumption and facilitates faster model updates, enhancing the overall performance of IoT networks. Simulation results demonstrate that our proposed federated learning approach outperforms the other considered approaches on both MNIST and RT-IoT2022 datasets. It achieves on MNIST an accuracy improvement of 12 %, a reduction in convergence time of 22 %, and a bandwidth usage reduction of 21 %; and on RT-IoT2022, an accuracy enhancement of 9 %, a convergence time reduction of 18 %, and a bandwidth usage reduction of 25 %, confirming its overall superiority for IoT systems.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113526"},"PeriodicalIF":7.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143830200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning of Quasi-nonlinear Long-term Cognitive Networks using iterative numerical methods","authors":"Gonzalo Nápoles , Yamisleydi Salgueiro","doi":"10.1016/j.knosys.2025.113464","DOIUrl":"10.1016/j.knosys.2025.113464","url":null,"abstract":"<div><div>Quasi-nonlinear Long-term Cognitive Networks (LTCNs) are an extension of Fuzzy Cognitive Maps (FCMs) for simulation and prediction problems ranging from regression and pattern classification to time series forecasting. In this extension, the quasi-nonlinear reasoning allows the model to escape from unique fixed-point attractors, while the unbounded weights equip the network with improved approximation capabilities. However, training these neural systems continues to be challenging due to their recurrent nature. Existing error-driven learning algorithms (metaheuristic-based, regression-based, and gradient-based) are either computationally demanding, fail to fine-tune the recurrent connections, or suffer from vanishing/exploding gradient issues. To bridge this gap, this paper presents a learning procedure that employs numerical iterative optimizers to solve a regularized least squares problem, aiming to enhance the precision and generalization of LTCN models. These optimizers do not require analytical knowledge about the Jacobian or the Hessian and were carefully chosen to address the inherent challenges of training recurrent neural networks. They are devoted to solving nonlinear optimization problems using trust regions, linear or quadratic approximations, and interpolations between the Gauss–Newton and gradient descent methods. In addition, we explore the model’s performance for several activation functions including piecewise, sigmoid, and hyperbolic variants. The empirical studies indicate that the proposed learning procedure outperforms state-of-the-art algorithms to a significant extent.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113464"},"PeriodicalIF":7.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yonggang Li , Zhili Xiao , Ang Gao , Weinong Wu , Errong Pei
{"title":"Hierarchical influential node identification in multi-agent networks based on triangular recursive compression","authors":"Yonggang Li , Zhili Xiao , Ang Gao , Weinong Wu , Errong Pei","doi":"10.1016/j.knosys.2025.113434","DOIUrl":"10.1016/j.knosys.2025.113434","url":null,"abstract":"<div><div>The identification of influential nodes in complex networks is a core research topic in the field of network science. Existing methods often rely on the topological features of static networks, which struggle to fully capture the characteristics of nodes in dynamic networks. To address this issue, this paper proposes a hierarchical control node selection algorithm based on triangular recursive compression and control influence index (HTRCI) to identify influential nodes and enhance network stability accurately. Firstly, the control influence index is introduced, integrating multiple attributes such as node energy level and neighbor variation rate to evaluate node importance comprehensively. Additionally, an efficient triangle detection algorithm based on intersection matrices is designed to improve the extraction efficiency of triangular features. To address the influence overlap caused by shared nodes and edges within triangular structures, this paper proposes a triangle control node selection method incorporating a conflict resolution mechanism. In non-triangular regions, this paper designs a non-triangular structure control node identification algorithm based on coverage maximization, which regulates the distribution of control nodes by introducing a repulsion mechanism. Furthermore, a hierarchical control node selection strategy is proposed to iteratively compress the management regions of control nodes, reducing the number of control nodes and improving the global control efficiency of the network. In NS3 simulations, the proposed algorithm is evaluated on six real-world networks and compared with eight state-of-the-art algorithms. The results demonstrate that the HTRCI algorithm exhibits significant advantages in terms of information transmission rate and network stability, validating its superiority and applicability in complex dynamic networks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113434"},"PeriodicalIF":7.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manisha Sanjay Sirsat , Diego Isla-Cernadas , Eva Cernadas , Manuel Fernández-Delgado
{"title":"Machine and deep learning for the prediction of nutrient deficiency in wheat leaf images","authors":"Manisha Sanjay Sirsat , Diego Isla-Cernadas , Eva Cernadas , Manuel Fernández-Delgado","doi":"10.1016/j.knosys.2025.113400","DOIUrl":"10.1016/j.knosys.2025.113400","url":null,"abstract":"<div><div>Nutrient deficiency in wheat plants can lead to diseases and important losses in yield. These diseases can be visually detected on wheat leaf images. We perform the image classification as nutrient controlled or deficient, using a collection of 57 machine learning classifiers programmed in 4 different programming languages, applied on color texture features extracted from the images. We also use other 90 methods under the Caret automated R classification framework on the same features. Furthermore, we use 62 deep learning networks under three frameworks applied on the leaf images in three settings: trained from the scratch, fine-tuning of pretrained networks and classification of deep and shallow features extracted by deep networks. The radial basis function (RBF) neural network achieves the best performance, with kappa and accuracy of 57% and 81.2%, and with a low false positive rate (11.1%), while pretrained deep networks and classification of shallow features achieve 40% and 47%, respectively. Since nutrient deficiency is a continuous concept, ranging from 0% to 100%, and a sharp categorization into controlled and deficient may always be relative, these results identify the RBF network as an accurate approach for the detection of nutrient deficiency in wheat leaves.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113400"},"PeriodicalIF":7.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lizhe Wang , Qi Liang , Yu Che , Lanmei Wang , Guibao Wang
{"title":"IFDepth: Iterative fusion network for multi-frame self-supervised monocular depth estimation","authors":"Lizhe Wang , Qi Liang , Yu Che , Lanmei Wang , Guibao Wang","doi":"10.1016/j.knosys.2025.113467","DOIUrl":"10.1016/j.knosys.2025.113467","url":null,"abstract":"<div><div>Self-supervised monocular depth estimation has gained prominence due to its training efficiency and applicability in autonomous systems. However, existing methods often exhibit limitations in preserving depth relationships in texture-homogeneous scenes and recovering fine-grained structural details. We present IFDepth, an iterative multi-frame depth prediction framework that refines coarse depth estimates through synergistic integration of optical flow features and multi-scale contextual information. Our architecture introduces three key components: (1) a Motion Feature Encoder (MFE) for spatiotemporal motion pattern extraction, (2) a Feature-Depth Cross Attention Layer (FCAL) enabling cross-modal feature interaction, and (3) a Gated Recurrent Unit (GRU)-based refinement module that progressively enhances predictions without computationally expensive 3D volume operations. Through iterative feature fusion, IFDepth effectively recovers occluded regions and high-frequency details while maintaining geometrically consistent depth ordering. Extensive experiments on KITTI, Cityscapes, and Robotcar datasets demonstrate state-of-the-art performance, particularly in preserving scene details and accurate depth ordering, outperforming existing monocular training approaches.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"318 ","pages":"Article 113467"},"PeriodicalIF":7.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143835322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yishan Hu , Jun Zhao , Chen Qi , Yan Qiang , Juanjuan Zhao , Bo Pei
{"title":"VC-Mamba: Causal Mamba representation consistency for video implicit understanding","authors":"Yishan Hu , Jun Zhao , Chen Qi , Yan Qiang , Juanjuan Zhao , Bo Pei","doi":"10.1016/j.knosys.2025.113437","DOIUrl":"10.1016/j.knosys.2025.113437","url":null,"abstract":"<div><div>Recently, spatiotemporal representation learning based on deep learning has driven the advancement of video understanding. However, existing methods based on convolutional neural networks (CNNs) and Transformers still face limitations in understanding implicit information in complex scenes, particularly in capturing dynamic changes over long-range spatiotemporal data and inferring hidden contextual information in videos. To address these challenges, we propose VC-Mamba, a video implicit understanding model based on causal Mamba representation consistency. By segmenting explicit texture information into token features and leveraging the linear Mamba framework to capture long-range spatiotemporal interactions, we introduce the spatiotemporal motion Mamba block for motion perception. This block includes a multi-head temporal length Mamba to enhance cross-frame motion consistency and a bidirectional gated space Mamba to capture the inter-frame dependencies of feature tokens. Through the analysis of both explicit and implicit spatiotemporal interactions, VC-Mamba effectively captures long-range spatiotemporal representations. Additionally, we design an attention mask perturbation strategy based on causal invariance constraints to optimize the existing selective spatiotemporal mask mechanism. By progressively enhancing the causal strength of related features, this strategy analyzes implicit causal chains in videos, improving the model’s resistance to interference from weakly causal features and enhancing the robustness and stability of implicit information understanding. Finally, we conducted extensive experiments on several datasets, including short-term action recognition and long-term video reasoning tasks. The results demonstrate that VC-Mamba matches or surpasses state-of-the-art models, particularly in capturing long-range spatiotemporal interactions and causal reasoning, proving its effectiveness and generalization in video implicit understanding tasks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"317 ","pages":"Article 113437"},"PeriodicalIF":7.2,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}