Knowledge-Based Systems最新文献

筛选
英文 中文
SCSNNet: Siamese convolutional spiking neural network for childhood medulloblastoma detection using microscopic images SCSNNet:用显微镜图像检测儿童髓母细胞瘤的暹罗卷积脉冲神经网络
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-03-25 Epub Date: 2026-01-16 DOI: 10.1016/j.knosys.2026.115357
Ramesh Kumar Ramaswamy , Aruna Rajendiran , J. Jude Moses Anto Devakanth , Santhosh Kumar Balan
{"title":"SCSNNet: Siamese convolutional spiking neural network for childhood medulloblastoma detection using microscopic images","authors":"Ramesh Kumar Ramaswamy ,&nbsp;Aruna Rajendiran ,&nbsp;J. Jude Moses Anto Devakanth ,&nbsp;Santhosh Kumar Balan","doi":"10.1016/j.knosys.2026.115357","DOIUrl":"10.1016/j.knosys.2026.115357","url":null,"abstract":"<div><div>The highly dangerous brain tumor is known as Childhood medulloblastoma (CMB), which affects children predominantly and leads to a notable death rate. Traditionally, the standard approach for diagnosis has involved histopathology. However, histopathology is complex, consumes more time, and demands specialized expertise, which increases the risk of misdiagnosis. Thus, the new model named Siamese Convolutional Spiking Neural Network (SCSN<img>Net) is implemented to overcome the misdiagnosis. The microscopic images sourced from the IEEE Data Port are referred to as the input and denoising is done by the Weiner filter. Then, the denoised image is segmented by the EffiSegNet technique. The system then derives essential characteristics from the input images, employing the Location Directional Number (LDN) combined with Haar Wavelet analysis and histogram-based descriptors. These extracted features are then forwarded to the classification stage, where the proposed SCSN<img>Net framework operates. SCSN<img>Net integrates the strengths of a Siamese Convolutional Neural Network (SCNN) with those of a Deep Spiking Neural Network (DSNN), enabling robust identification of childhood medulloblastoma (CMB). The model distinguishes between healthy tissue and CMB cases, achieving strong performance with an accuracy of 92.28%, a True Positive Rate (TPR) of 93.21%, and a True Negative Rate (TNR) of 91.48% when evaluated on k-group 9.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115357"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiscale scattering forests: A domain-generalizing approach for fault diagnosis under data constraints 多尺度散射森林:数据约束下故障诊断的域泛化方法
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-03-25 Epub Date: 2026-01-21 DOI: 10.1016/j.knosys.2026.115389
Zhuyun Chen , Hongqi Lin , Youpeng Gao , Jingke He , Zehao Li , Weihua Li , Qiang Liu
{"title":"Multiscale scattering forests: A domain-generalizing approach for fault diagnosis under data constraints","authors":"Zhuyun Chen ,&nbsp;Hongqi Lin ,&nbsp;Youpeng Gao ,&nbsp;Jingke He ,&nbsp;Zehao Li ,&nbsp;Weihua Li ,&nbsp;Qiang Liu","doi":"10.1016/j.knosys.2026.115389","DOIUrl":"10.1016/j.knosys.2026.115389","url":null,"abstract":"<div><div>Currently, deep learning-based intelligent fault diagnosis techniques have been widely used in the manufacturing industry. However, due to various constraints, fault data for rotating machinery is often limited. Moreover, in real industrial environments, operating conditions of rotating machinery vary based on task requirements, leading to significant data variability across different operating conditions. This variability presents a major challenge for few-shot fault diagnosis, especially in scenarios requiring domain generalization across diverse operating conditions. To address this challenge, this paper proposes multiscale scattering forests (MSF): a domain-generalizing approach for fault diagnosis under data constraints. Firstly, a multiscale wavelet scattering predefined layer is designed to extract robust invariant features from input samples, where these scattering coefficients are concatenated and then used as new samples resulting from the data enhancement of the original samples. Then, a deep stacked ensemble forests with skip connection is designed to handle the transformed multiscale samples, allowing earlier information to jump over layers and improving the model’s feature representation capabilities. Finally, a similarity metric-based weighting learning strategy is developed to implement diagnostic results of each forest, integrating the models of assigning weights into an ensemble framework to enhance domain generalization performance under various operation conditions. The MSF model is comprehensively evaluated using a computer numerical control (CNC) machine tool spindle bearing dataset in an industrial environment. Experimental results demonstrate that the proposed approach not only exhibits strong diagnostic and generalization performance in few-shot scenarios without the support of additional source domains but also outperforms other state-of-the-art few-shot fault diagnosis methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115389"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-contrast feature cross entanglement network for joint MR image reconstruction and super-resolution 多对比度特征交叉纠缠网络联合磁共振图像重建和超分辨率
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-03-25 Epub Date: 2026-01-19 DOI: 10.1016/j.knosys.2026.115368
Guoqing Ge , Weisheng Li , Yucheng Shu , Xiaoyu Qiao
{"title":"Multi-contrast feature cross entanglement network for joint MR image reconstruction and super-resolution","authors":"Guoqing Ge ,&nbsp;Weisheng Li ,&nbsp;Yucheng Shu ,&nbsp;Xiaoyu Qiao","doi":"10.1016/j.knosys.2026.115368","DOIUrl":"10.1016/j.knosys.2026.115368","url":null,"abstract":"<div><div>Reconstruction and super-resolution (SR) provide effective solutions for accelerating multi-contrast magnetic resonance (MR) imaging by leveraging auxiliary contrast information to restore target contrast from an undersampled counterpart. Although recent advances have explored the joint optimization of reconstruction and SR, most existing frameworks still adopt shallow concatenation or independent decoding branches, thereby failing to fully exploit the inherent complementarity and hierarchical correlations between the two tasks. Additionally, auxiliary contrast information is typically integrated in an isotropic and coarse-grained manner, neglecting directional and structure-specific dependencies across anatomical regions, thus weakening its ability to provide discriminative guidance for the target contrast reconstruction. To address these limitations, we propose a multi-contrast feature cross entanglement network (MFCE-Net) that facilitates comprehensive feature interaction across modalities and tasks. In detail, we first introduce a multi-branch feature guidance module to facilitate multi-scale and direction-aware feature transfer across modalities. Furthermore, within the designed top-down architecture, we incorporate an attention mechanism that allows the SR branch to capture global structures while preserving fine textures by proposing a feature representation enhancement module. Finally, we design a feature entanglement interaction (FEI) module that employs a cross-weighting mechanism across spatial and channel dimensions to facilitate deep feature sharing and mutual reinforcement between the reconstruction and SR tasks. Extensive experiments are conducted with various advanced multi-contrast MR imaging methods on fastMRI, BraTS2019 and clinical in-house datasets, and the results demonstrate the superiority of our model. The code is released at <span><span>https://github.com/coolggq/MFCE-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115368"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mutual masked image consistency and feature adversarial training for semi-supervised medical image segmentation 半监督医学图像分割的互掩膜图像一致性和特征对抗训练
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-03-25 Epub Date: 2026-01-23 DOI: 10.1016/j.knosys.2026.115349
Wei Li , Linye Ma , Wenyi Zhao , Huihua Yang
{"title":"Mutual masked image consistency and feature adversarial training for semi-supervised medical image segmentation","authors":"Wei Li ,&nbsp;Linye Ma ,&nbsp;Wenyi Zhao ,&nbsp;Huihua Yang","doi":"10.1016/j.knosys.2026.115349","DOIUrl":"10.1016/j.knosys.2026.115349","url":null,"abstract":"<div><div>Semi-supervised medical image segmentation (SSMIS) aims to alleviate the burden of extensive pixel/voxel-wise annotations by effectively leveraging unlabeled data. While prevalent approaches relying on pseudo-labeling or consistency regularization have shown promise, they are often prone to confirmation bias due to limited feature diversity. Furthermore, existing mixed sampling strategies utilized to expand the training scale frequently generate synthetic data that deviates from real-world distributions, potentially misleading the learning process. To address these challenges, we introduce a novel framework called Mutual Masked Image Consistency and Feature Adversarial Training (MCFAT-Net). Our approach enhances model diversity through a multi-perspective strategy, fostering global-local consistency to improve generalization. Specifically, MCFAT-Net comprises a shared encoder and dual classifiers that leverage Mutual Feature Adversarial Training to inject perturbations, ensuring sub-network divergence and decision boundary smoothness. Moreover, we integrate a dual-level data augmentation strategy: Cross-Set CutMix operating at the inter-sample level to capture global dataset structures, and Mutual Masked Image Consistency operating at the intra-sample level to refine fine-grained local representations. This combination enables the simultaneous capture of pairwise structures across the entire dataset and individual part-object relationships. Extensive experiments on three public datasets demonstrate that MCFAT-Net achieves superior performance compared to state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115349"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedCLIP-Distill: Heterogeneous federated cross-modal knowledge distillation for multi-domain visual recognition FedCLIP-Distill:面向多领域视觉识别的异构联邦跨模态知识蒸馏
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-03-25 Epub Date: 2026-01-22 DOI: 10.1016/j.knosys.2026.115383
Yuankun Xia, Hui Wang, Yufeng Zhou
{"title":"FedCLIP-Distill: Heterogeneous federated cross-modal knowledge distillation for multi-domain visual recognition","authors":"Yuankun Xia,&nbsp;Hui Wang,&nbsp;Yufeng Zhou","doi":"10.1016/j.knosys.2026.115383","DOIUrl":"10.1016/j.knosys.2026.115383","url":null,"abstract":"<div><div>Federated learning (FL) for multi-domain visual recognition confronts significant challenges due to heterogeneous data distributions and domain shifts, which severely impair the semantic generalization capability of existing methods. To address these challenges, we propose FedCLIP-Distill, a novel framework that employs dual-domain knowledge distillation (KD) and contrastive relational distillation (CRD) to leverage the powerful visual-language alignment of CLIP in heterogeneous FL environments. Our approach employs a centralized CLIP teacher model to distill robust visual-textual semantics into lightweight client-side student models, thereby enabling effective local domain adaptation. We provide a theoretical convergence analysis proving that our distillation mechanism effectively mitigates domain gaps and facilitates robust convergence under non-IID settings. Extensive experiments on Office-Caltech10 and DomainNet benchmarks show that FedCLIP-Distill outperforms other methods: it achieves an average cross-domain accuracy of 98.5% on Office-Caltech10 and 80.50% on DomainNet. In different heterogeneous situations (e.g., Dirichlet <em>α</em> = 0.5, 9.52% higher than FedCLIP), demonstrating significant improvements in accuracy and generalization under heterogeneous scenarios. The source code is available at <span><span>https://github.com/Yuankun-Xia/FedCLIP-Distill</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115383"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Omniscient bottom-up double-stream symmetric network for image captioning 全知自底向上双流对称网络图像字幕
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-03-25 Epub Date: 2026-01-21 DOI: 10.1016/j.knosys.2026.115366
Jianchao Li, Wei Zhou, Kai Wang, Haifeng Hu
{"title":"Omniscient bottom-up double-stream symmetric network for image captioning","authors":"Jianchao Li,&nbsp;Wei Zhou,&nbsp;Kai Wang,&nbsp;Haifeng Hu","doi":"10.1016/j.knosys.2026.115366","DOIUrl":"10.1016/j.knosys.2026.115366","url":null,"abstract":"<div><div>Transformer-based image captioning models have achieved promising performance through various effective learning schemes. We contend that a truly comprehensive learning schema, defined as omniscient learning, encompasses two components: 1) a hierarchical knowledge base with low redundancy as input, and 2) a bottom-up layer-wise network as architecture. While previous captioning models primarily focus on network design and neglect the construction of knowledge base. In this paper, our hierarchical knowledge base is constituted by personalized knowledge of real-time features and contextual knowledge of consensus. Simultaneously, we devise a bottom-up double-stream symmetric network (BuNet) to progressively learn layered features. Specifically, the hierarchical knowledge base includes single-image region and grid features from the local-domain and contextual knowledge tokens from the broad-domain. Correspondingly, BuNet is divided into local-domain self-learning (LDS) stage and broad-domain consensus-learning (BDC) stage. Besides, we explore noise decoupling strategies to illustrate the extraction of contextual knowledge tokens. Furthermore, the knowledge disparity between region and grid reveals that the purely “symmetric network” of BuNet cannot effectively capture additional spatial relationships present in the region stream. Consequently, we design relative spatial encoding in LDS stage of BuNet to learn regional spatial knowledge. In addition, we employ a lightweight backbone to reduce computational complexity while providing a simple paradigm for omniscient learning. Our method is extensively tested on MS-COCO and Flickr30K, where it achieves better performance than some captioning models.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115366"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RAATrack: Reliable appearance aggregation for video-level multimodel tracking RAATrack:视频级多模型跟踪的可靠外观聚合
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-03-25 Epub Date: 2026-01-25 DOI: 10.1016/j.knosys.2026.115414
Yingran Jin, Yun Gao, Qianyun Feng
{"title":"RAATrack: Reliable appearance aggregation for video-level multimodel tracking","authors":"Yingran Jin,&nbsp;Yun Gao,&nbsp;Qianyun Feng","doi":"10.1016/j.knosys.2026.115414","DOIUrl":"10.1016/j.knosys.2026.115414","url":null,"abstract":"<div><div>Multimodal tracking has attracted widespread attention due to its ability to mitigate the inherent limitations of conventional RGB-based tracking. However, most existing multimodal trackers primarily focus on spatial feature fusion and enhancement across different modalities, or only exploit sparse temporal dependencies between video frames, making it difficult to systematically capture and utilize long-range temporal correlations and effectively model target dynamics and motion information. To address this issue, we propose a novel context-aware video-level multimodal tracking framework based on <strong>r</strong>eliable <strong>a</strong>ppearance <strong>a</strong>ggregation, named RAATrack. During tracking, RAATrack continuously aggregates reliable target appearance information and, leveraging the hidden state mechanism of Mamba, records and propagates rich contextual information across the entire video sequence, thereby enhancing tracking robustness. The core component of RAATrack is the <strong>a</strong>ppearance <strong>i</strong>nformation <strong>a</strong>ggregation (AIA) module, which consists of a cross-attention layer and a Mamba layer. The cross-attention layer periodically calibrates appearance information, while the Mamba layer continuously captures target appearance variations and establishes long-range temporal dependencies across video frames. Experiments conducted on five diverse multi-modal datasets (RGBT234, LasHeR, VisEvent, DepthTrack, and VOT-RGBD2022) demonstrate that RAATrack achieves state-of-the-art performance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115414"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransXV2S-NET: A novel hybrid deep learning architecture with dual-contextual graph attention for multi-class skin lesion classification TransXV2S-NET:一种新的基于双上下文图关注的混合深度学习架构,用于多类皮肤病变分类
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-03-25 Epub Date: 2026-01-22 DOI: 10.1016/j.knosys.2026.115407
Adnan Saeed , Khurram Shehzad , Muhammad Ghulam Abbas Malik , Saim Ahmed , Ahmad Taher Azar
{"title":"TransXV2S-NET: A novel hybrid deep learning architecture with dual-contextual graph attention for multi-class skin lesion classification","authors":"Adnan Saeed ,&nbsp;Khurram Shehzad ,&nbsp;Muhammad Ghulam Abbas Malik ,&nbsp;Saim Ahmed ,&nbsp;Ahmad Taher Azar","doi":"10.1016/j.knosys.2026.115407","DOIUrl":"10.1016/j.knosys.2026.115407","url":null,"abstract":"<div><div>Accurate early-stage diagnosis of skin lesions remains challenging for dermatologists due to visual complexity and subtle inter-class differences. Traditional computer-assisted diagnostic tools struggle to capture detailed patterns and contextual relationships, especially under varying imaging conditions. In this study, we introduce TransXV2S-Net, a new hybrid deep-learning model based on multiple branches designed for automated skin lesion classification. These branches enable to extract features at different stages from skin lesions separately and learn complex combinations between them. These branches include an EfficientNetV2S, Swin Transformer, and a modified Xception architecture, a new feature extraction method, as well as a Dual-Contextual Graph Attention Network (DCGAN) that is proposed to make the network focus on discriminative parts of skin lesions. A novel Dual-Contextual Graph Attention Network (DCGAN) enhances discriminative feature learning through dual-path attention mechanisms and graph-based operations that effectively capture both local textural details and global contextual patterns. The Gray World Standard Deviation (GWSD) preprocessing algorithm improves lesion visibility and removes imaging artifacts Benchmarking against an 8-class skin cancer dataset confirmed the model's efficacy, yielding 95.26% accuracy, 94.30% recall, and an AUC-ROC of 99.62%. Further validation on the HAM10000 dataset demonstrates exceptional performance with 95% accuracy, confirming the model's robustness and generalization capability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115407"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OACI: Object-aware contextual integration for image captioning OACI:用于图像字幕的对象感知上下文集成
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-03-25 Epub Date: 2026-01-22 DOI: 10.1016/j.knosys.2026.115374
Shuhan Xu , Mengya Han , Wei Yu , Zheng He , Xin Zhou , Yong Luo
{"title":"OACI: Object-aware contextual integration for image captioning","authors":"Shuhan Xu ,&nbsp;Mengya Han ,&nbsp;Wei Yu ,&nbsp;Zheng He ,&nbsp;Xin Zhou ,&nbsp;Yong Luo","doi":"10.1016/j.knosys.2026.115374","DOIUrl":"10.1016/j.knosys.2026.115374","url":null,"abstract":"<div><div>Image captioning is a fundamental task in visual understanding, aiming to generate textual descriptions for given images. Current image captioning methods are gradually shifting towards a fully end-to-end paradigm, which leverages pre-trained vision models to process images directly and generate captions, eliminating the need for separating object detectors. These methods typically rely on global features, neglecting the precise perception of local ones. The lack of fine-grained focus on the object may result in suboptimal prototype features contaminated by surrounding noise, and thus negatively affect the generation of object-related captions. To address this issue, we propose a novel method termed object-aware context integration (OACI), which captures the salient prototypes of individual objects and understands their relationships by leveraging the global context of the entire scene. Specifically, we propose an object-aware prototype learning (OAPL) module that focuses on regions containing objects to enhance object perception and selects the most confident regions for learning object prototypes. Moreover, a class affinity constraint (CAC) is designed to facilitate the learning of these prototypes. To understand the relationships between objects, we further propose an object-context integration (OCI) module that integrates global context with local object prototypes, enhancing the understanding of image content and improving the generated image captions. We conduct extensive experiments on the popular MSCOCO, Flickr8k and Flickr30k datasets, and the results demonstrate that integrating global context with local object details significantly improves the quality of generated captions, validating the effectiveness of the proposed OACI method.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115374"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking heterophilic graph learning via graph curvature 基于图曲率的恋异性图学习的再思考
IF 7.6 1区 计算机科学
Knowledge-Based Systems Pub Date : 2026-03-25 Epub Date: 2026-01-23 DOI: 10.1016/j.knosys.2026.115409
Jian Wang , Xingcheng Fu , Qingyun Sun , Li-E Wang , Hao Peng , Jiting Li , Xianxian Li , Minglai Shao
{"title":"Rethinking heterophilic graph learning via graph curvature","authors":"Jian Wang ,&nbsp;Xingcheng Fu ,&nbsp;Qingyun Sun ,&nbsp;Li-E Wang ,&nbsp;Hao Peng ,&nbsp;Jiting Li ,&nbsp;Xianxian Li ,&nbsp;Minglai Shao","doi":"10.1016/j.knosys.2026.115409","DOIUrl":"10.1016/j.knosys.2026.115409","url":null,"abstract":"<div><div>The performance of graph neural networks is limited on heterophilic graphs since heterophilic connections hinder the transport of supervision signals related to downstream tasks. In recent years, most existing works based on node-pair heterophily “transform” heterophilic graphs into special homophilic graphs, which often increase homophilic connectivity and remove heterophilic edges, thereby converting highly heterophilic graphs into highly homophilic ones. They only consider the label difference between node pairs while overlooking the change in the label distribution between their neighborhoods. They need to provide some heuristic priors or complex designs to alleviate the lack of underlying understanding of the heterophilic information propagation, which leads to the issue of heterophily inconsistency. To address the issue of heterophily inconsistency, based on optimal transport theory, we extend the definition of curvature and propose the Heterophily Curvature Graph Representation Learning framework (<strong>HetCurv</strong>) to optimize the information transport structure and learn better node representations simultaneously. HetCurv perceives the variation of supervision signals on heterophilic graphs through heterophily curvature, and learns the optimal information transport pattern for specific downstream tasks. Extensive experiments demonstrate the superiority of the proposed method in comparison to state-of-the-art baselines across various node classification benchmarks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115409"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书