Yun Li , Ningyuan Zhao , Xue Yang , Liping Luo , Peiguang Jing
{"title":"RSDC-Net: Robust self-supervised dynamic collaboration network for infrared and visible image fusion","authors":"Yun Li , Ningyuan Zhao , Xue Yang , Liping Luo , Peiguang Jing","doi":"10.1016/j.knosys.2025.114541","DOIUrl":"10.1016/j.knosys.2025.114541","url":null,"abstract":"<div><div>Infrared and visible image fusion (IVIF) aims to integrate complementary information from distinct sensors, yielding fused results that outperform the capabilities of either individual modality alone. Due to inherent modality bias, conventional fusion-reconstruction frameworks often struggle to effectively prioritize the representation of critical shared regions and diverse heterogeneous areas, while also showing deficiencies in shallow feature interactions. To address these challenges, we propose a robust self-supervised dynamic collaboration network (RSDC-Net), which adaptively and comprehensively selects complementary cues from both infrared and visible modalities. Specifically, we introduce a steady-state contrastive autoencoder that leverages a multi-task self-supervised strategy to enhance the robust representation of key shared cues in the mixed information flow. This strategy promotes deep cross-modal modeling of global dependencies across sources, thereby achieving semantic consistency. Furthermore, we design a latent inter-modal focus-guided module that integrates a bilateral transposed attention mechanism with a dynamic selection component to refine local-level heterogeneous cue allocation under the guidance of mutual global dependencies. Notably, a gated feed-forward unit is incorporated to filter outlier information flows across modalities. Quantitative results on the MSRS, TNO, and M3FD datasets demonstrate that RSDC-Net achieves the best performance on most of the eight evaluation metrics. Meanwhile, it also exhibits superior performance in qualitative visual assessments on these datasets as well as under challenging scenarios.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114541"},"PeriodicalIF":7.6,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MgHiSal: MLLM-guided hierarchical semantic alignment for multimodal knowledge graph completion","authors":"Jie Chen , Wuyang Zhang , Shu Zhao , Yunxia Yin","doi":"10.1016/j.knosys.2025.114552","DOIUrl":"10.1016/j.knosys.2025.114552","url":null,"abstract":"<div><div>Multimodal Knowledge Graph Completion (MMKGC) aims to improve link prediction and empower downstream applications like intelligent question answering, reasoning, and recommendation by integrating multimodal information. However, existing methods commonly face semantic fragmentation between visual and textual modalities. This issue largely stems from the reliance on pre-trained vision models, which often struggle to model deep entity semantics, resulting in noisy visual features and the neglect of key semantic information. To address this, we propose MgHiSal, an MLLM-guided Hierarchical Semantic Alignment framework for MMKGC. MgHiSal first generates context-aware visual descriptions by conditioning MLLM generation on existing entity text, ensuring low-noise, relevant representations for initial semantic alignment. The framework then utilizes a hierarchical gated attention mechanism that progressively unifies multimodal representations by dynamically selecting and optimizing key cross-modal features via regularization. Finally, a neighbor-aware module enhances entity representations by aggregating multimodal neighbor information. Experiments on DB15K and MKG-W show MgHiSal significantly improves MRR by approximately 13.1 % and 12.5 % over respective runner-ups. The source code is publicly available at <span><span>https://github.com/wyZhang016/MgHiSal</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114552"},"PeriodicalIF":7.6,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Touseef Saleh Bin Ahmed , Tawhidur Rahman , Shammo Biswas , Saifur Rahman Sabuj , Mohammed Belal Bhuian , Mohammad Ali Moni , Md Ashraful Alam
{"title":"A vision transformer-based hybrid neural architecture for automated handwritten Bangla character recognition and braille conversion","authors":"Touseef Saleh Bin Ahmed , Tawhidur Rahman , Shammo Biswas , Saifur Rahman Sabuj , Mohammed Belal Bhuian , Mohammad Ali Moni , Md Ashraful Alam","doi":"10.1016/j.knosys.2025.114546","DOIUrl":"10.1016/j.knosys.2025.114546","url":null,"abstract":"<div><div>The rapid advancement of technology has led to notable changes in the current educational system. Nevertheless, there are still relatively few assisting aids that can help in teaching individuals with disabilities, such as those who are blind or visually impaired. An effective teaching strategy for those who are blind or visually impaired is braille. Although it has been digitized to produce an electronic version, handwritten characters are not considered in those versions. Studies on English character recognition have shown high accuracy, which is not the case with Bangla character recognition. We present an automated system that converts handwritten Bangla characters to braille using novel hybrid deep neural network architectures. Our approach begins with a Character Quality Assessment Framework (CQAF), which employs adaptive thresholds and comprehensive quality metrics designed explicitly for Bangla script characteristics. Building upon this foundation, we present two architectures. HybridNet-L represents our initial multi-stream design, while HybridNet-S is a redesigned lightweight variant that reduces parameters and achieves superior accuracy, making it the primary contribution of this work. To complete the system, we implement a comprehensive accessibility solution featuring real-time braille hardware interface and text-to-speech capabilities. The model effectively processes all 84 Bangla character classes including vowels, consonants, numerics, and compound characters. Extensive evaluation against seven baseline models demonstrates that our HybridNet-S achieves superior performance with 95.80% validation accuracy while maintaining computational efficiency suitable for embedded deployment. Statistical validation and ablation studies confirm the robustness and effectiveness of our multi-stream architecture for practical assistive technology applications.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114546"},"PeriodicalIF":7.6,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Liu , Chongben Tao , Zhongwei Shen , Cong Wu , Tianyang Xu , Xizhao Luo , Feng Cao , Zhen Gao , Zufeng Zhang , Sai Xu
{"title":"Dual attention focus network for few-shot skeleton-based action recognition","authors":"Jie Liu , Chongben Tao , Zhongwei Shen , Cong Wu , Tianyang Xu , Xizhao Luo , Feng Cao , Zhen Gao , Zufeng Zhang , Sai Xu","doi":"10.1016/j.knosys.2025.114549","DOIUrl":"10.1016/j.knosys.2025.114549","url":null,"abstract":"<div><div>Few-shot action recognition is a challenging yet practically significant problem that involves developing a model capable of learning discriminative features from a small number of labeled samples to recognize new action categories. Current methods typically infer spatial relationships either within or across skeletons to learn action representations, but this often results in features with insufficient discriminability and ineffective attention to critical body parts. To address these limitations, we propose DAF-Net, a novel framework that employs focal attention to jointly model intra-skeleton and inter-skeleton relationships, enhancing discriminative feature learning in few-shot skeleton-based action recognition. Unlike traditional methods that focus solely on intra-skeleton dependencies or inter-skeleton structures, DAF-Net dynamically integrates both components via focal attention, enhancing key body part representation and refining features, particularly in data-scarce conditions. Furthermore, DAF-Net incorporates an enhanced prototype generation strategy, optimizing class prototype formation via cosine similarity weighting to further improve feature discriminability in multi-shot scenarios. In temporal matching, cosine similarity evaluates local feature similarity within skeleton sequences, capturing directional variations of specific joints over time. Extensive experiments on three benchmark datasets (NTU-T, NTU-S, and Kinetics-skeleton) confirm significant performance gains, validating the effectiveness of DAF-Net.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114549"},"PeriodicalIF":7.6,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Orlando Salazar Torres, Diyar Altinses, Andreas Schwung
{"title":"Toward more effective bag-of-functions architectures: Exploring initialization and sparse parameter representation","authors":"David Orlando Salazar Torres, Diyar Altinses, Andreas Schwung","doi":"10.1016/j.knosys.2025.114536","DOIUrl":"10.1016/j.knosys.2025.114536","url":null,"abstract":"<div><div>Time series datasets often present complex temporal patterns that challenge both feature extraction and interpretability. The Bag-of-Functions (BoF) architecture has emerged as a promising approach to model such data by capturing diverse dynamics through functional components. However, its effectiveness is constrained by limitations in both interpretability and training stability. In this work, we address these challenges by introducing two complementary contributions: a regularization strategy that promotes sparse and interpretable parameter representations, and a tailored initialization scheme based on the Kaiming method adapted to the properties of BoF models. Our proposed initialization ensures improved convergence behavior and training stability, while the regularization enhances the clarity and semantic interpretability of the learned components. Evaluations on synthetic and real-world time series datasets demonstrate that these improvements preserve model performance and generalize well across varying signal complexities. Together, these strategies provide a more robust and interpretable foundation for Bag-of-Functions architectures in time series decomposition tasks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114536"},"PeriodicalIF":7.6,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangrui Zhang , Fuyong Zhao , Yutian Liu , Panfeng Chen , Yanhao Wang , Xiaohua Wang , Dan Ma , Huarong Xu , Mei Chen , Hui Li
{"title":"TreeQA: Enhanced LLM-RAG with logic tree reasoning for reliable and interpretable multi-hop question answering","authors":"Xiangrui Zhang , Fuyong Zhao , Yutian Liu , Panfeng Chen , Yanhao Wang , Xiaohua Wang , Dan Ma , Huarong Xu , Mei Chen , Hui Li","doi":"10.1016/j.knosys.2025.114526","DOIUrl":"10.1016/j.knosys.2025.114526","url":null,"abstract":"<div><div>Multi-Hop Question Answering (MHQA), crucial for complex information retrieval, remains challenging for current Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems, which often suffer from hallucination, reliance on incomplete knowledge, and opaque reasoning processes. Existing RAG methods, while beneficial, still struggle with the intricacies of multi-step inference and ensuring verifiable accuracy. This research introduces TreeQA, a novel framework designed to significantly enhance the reliability and interpretability of LLM-RAG systems in MHQA tasks. TreeQA addresses these limitations by decomposing complex multi-hop questions into a hierarchical logic tree of simpler, verifiable sub-questions, integrating evidence from both structured knowledge bases (e.g., Wikidata) and unstructured text (e.g., Wikipedia), and employing an iterative, evidence-based validation and self-correction mechanism at each reasoning step to dynamically rectify errors and prevent their accumulation. Extensive experiments on four benchmark datasets (WebQSP, QALD-en, AdvHotpotQA, and 2WikiMultiHopQA) demonstrate TreeQA’s superior performance, achieving Hit@1 scores of 87 %, 57 %, 53 %, and 59 %, respectively, representing improvements of 4 %-12 % over state-of-the-art LLM-RAG methods. These findings highlight the significant impact of structured, verifiable reasoning pathways in developing more robust, accurate, and interpretable knowledge-intensive AI systems, thereby enhancing the practical utility of LLMs in complex reasoning scenarios. Our code is publicly available at <span><span>https://github.com/ACMISLab/TreeQA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114526"},"PeriodicalIF":7.6,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145268713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A topology-aware multiscale feature fusion network for EEG-based motor imagery decoding","authors":"Chaowen Shen, Akio Namiki","doi":"10.1016/j.knosys.2025.114540","DOIUrl":"10.1016/j.knosys.2025.114540","url":null,"abstract":"<div><div>Motor imagery electroencephalography (MI-EEG) decoding is a crucial component of brain-computer interface (BCI) systems, serving as a valuable tool for motor function rehabilitation and fundamental neuroscience research. However, the strong nonlinearity and non-stationarity of MI-EEG signals make achieving high-precision decoding a challenging task. Current deep learning methods primarily extract the spatiotemporal features of MI-EEG signals while neglecting their potential association with spectral-topological features, thereby limiting the ability to integrate multidimensional information. To address these limitations, this paper proposes a Topology-Aware Multiscale Feature Fusion network (TA-MFF network) for MI-EEG signal decoding. Specifically, we designed a Spectral-Topological Data Analysis-Processing (S-TDA-P) module that leverages persistent homology features to analyze the spatial topological relationships between EEG electrodes and the persistent patterns of neural activity. Then, the Inter Spectral Recursive Attention (ISRA) mechanism is employed to model the correlations between different frequency bands, enhancing critical spectral features while suppressing irrelevant noise. Finally, the Spectral-Topological and Spatio-Temporal Feature Fusion (SS-FF) Unit is employed to progressively integrate topological, spectral, and spatiotemporal features, capturing dependencies across different domains. The experimental results show that the classification accuracy of the proposed model in BCIC-IV-2a, BCIC-IV-2b, and BCIC-III-Iva is 85.87 %, 90.2 %, and 80.5 %, respectively, outperforming the most advanced methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114540"},"PeriodicalIF":7.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical reinforcement learning for dynamic collision avoidance of autonomous ships under uncertain scenarios","authors":"Sijin Yu, Yunbo Li, Jiaye Gong","doi":"10.1016/j.knosys.2025.114528","DOIUrl":"10.1016/j.knosys.2025.114528","url":null,"abstract":"<div><div>Autonomous ships hold substantial potential for enhancing navigational safety, improving collision avoidance efficiency, and increasing adaptability in complex maritime environments, thereby presenting broad prospects for intelligent shipping. This paper introduces a dynamic collision avoidance control method based on a hierarchical reinforcement learning framework for autonomous ships. By integrating high-level global intent planning with low-level fine-grained rudder control, the proposed approach markedly enhances the interpretability, stability, and behavioral consistency of the learned policy. Furthermore, a multidimensional uncertainty modeling mechanism is incorporated during training, systematically accounting for variations in initial states and obstacle behavior patterns, which effectively strengthens policy adaptability and generalization under uncertain conditions. To validate the method, simulations are conducted in representative encounter scenarios as well as in omnidirectional dynamic obstacle tests. A comprehensive evaluation is carried out using multiple control performance metrics, environmental adaptability analysis, policy consistency assessment, and equivalent energy consumption comparisons. The results confirm that the proposed approach achieves stable and reliable intelligent collision avoidance control in highly dynamic environments, offering a feasible and scalable solution for high-performance collision avoidance in intelligent maritime navigation.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114528"},"PeriodicalIF":7.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heling Cao , Yanlong Guo , Yonghe Chu , Yun Wang , Junyi Duan , Peng Li
{"title":"Few-Shot hyperspectral image classification with mamba and manifold convolution fusion network","authors":"Heling Cao , Yanlong Guo , Yonghe Chu , Yun Wang , Junyi Duan , Peng Li","doi":"10.1016/j.knosys.2025.114531","DOIUrl":"10.1016/j.knosys.2025.114531","url":null,"abstract":"<div><div>Efficient modeling of global-local features is crucial for hyperspectral image (HSI) classification. The mamba network demonstrates strong capability in capturing global dependencies in HSI classification tasks, primarily utilizing a state-space model to extract first-order statistical features of spectral-spatial information in euclidean space, providing an initial representation of data characteristics. However, under few-shot conditions, fully exploiting effective features from limited samples and overcoming challenges such as class overlap and feature space sparsity caused by the insufficient extraction of second-order statistical features in riemannian space remain major research challenges. Therefore, we propose a dual branch manifold convolution-mamba network (DBMCMamba) for HSI classification. Specifically, it adaptively fuses forward and backward information through the vision mamba (Vim) block and utilizes the S6 module to extract global information, thereby enhancing global feature extraction capability. Meanwhile, the manifold convolution module extracts first-order statistical features of spectral-spatial information through convolutional layers and learns second-order statistics via the SPD manifold to strengthen DBMCMamba’s local feature representation under few-shot conditions. Finally, global and local features are fused for classification, effectively improving the accuracy and performance of HSI classification. On the Indian Pines, Pavia University, HongHu, and HanChuan datasets, DBMCMamba achieved classification accuracies of 95.23 %, 95.80 %, 95.58 %, and 94.93 %, respectively. Experimental results show that DBMCMamba demonstrates significant performance improvements compared to the state-of-the-art classification models. The code will be available online at <span><span>https://github.com/ASDFFGG121EAA/DBMCMamba</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114531"},"PeriodicalIF":7.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanbu Guo , Quanming Guo , Shengli Song , Yihan Wang , Jinde Cao
{"title":"Heterogeneous graph collaborative representation learning for drug-related microbe prediction with attentive fusion and reciprocal distillation","authors":"Yanbu Guo , Quanming Guo , Shengli Song , Yihan Wang , Jinde Cao","doi":"10.1016/j.knosys.2025.114548","DOIUrl":"10.1016/j.knosys.2025.114548","url":null,"abstract":"<div><div>Microbes are microorganisms with biological molecules and have significant therapeutic potential for treating diseases, underscoring the need for computational methods to screen microbes targeting disease-associated drugs. However, the computational methods often consider node embedding or structure features between microbes and drugs, and have a severe class imbalance problem inherent in sparse association data. In this work, we proposed a heterogeneous graph collaborative representation learning model that combines the merits of attentive fusion and reciprocal distillation for drug-related microbe prediction. First, we constructed the heterogeneous biological information and meta-path-induced graphs of microbes and drugs. Then, a topological structure feature encoder is devised to extract complex topological and semantic interaction patterns from heterogeneous biological graphs with microbes and drugs, while an efficient transformer concurrently extracts discriminative semantic and structural information based on the graph position information of nodes. Next, a reciprocal distillation schema is developed to mitigate the adverse effects of the data imbalance problem, and enable the distribution consistency of the model between topological and semantic information extraction. Moreover, we devised a dual collaborative feature fusion schema that combines graph topological and dual meta-path-based semantic features to obtain the discriminative features of microbes and drugs. Through reciprocal distillation, an efficient optimization function focuses on hard-to-classify samples of drug-related microbes via discriminative features. Extensive experiments demonstrate that our model could deal with the association sparsity problem and extract more semantics and structure. Meanwhile, case studies indicate that our model could discover reliable candidate microbes associated with a special drug.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114548"},"PeriodicalIF":7.6,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}