NeurocomputingPub Date : 2024-10-26DOI: 10.1016/j.neucom.2024.128793
Xiong Pan , Xuemei Xie , Jianxiu Yang
{"title":"Mixed-scale cross-modal fusion network for referring image segmentation","authors":"Xiong Pan , Xuemei Xie , Jianxiu Yang","doi":"10.1016/j.neucom.2024.128793","DOIUrl":"10.1016/j.neucom.2024.128793","url":null,"abstract":"<div><div>Referring image segmentation aims to segment the target by a given language expression. Recently, the bottom-up fusion network utilizes language features to highlight the most relevant regions during the visual encoder stage. However, it is not comprehensive that establish only the relationship between pixels and words. To alleviate this problem, we propose a mixed-scale cross-modal fusion method that widens the interaction between vision and language. Specially, at each stage, pyramid pooling is used to augment visual perception and improve the interaction between visual and linguistic features, thereby highlighting relevant regions in the visual data. Additionally, we employ a simple multi-scale feature fusion module to effectively combine multi-scale aligned features. Experiments conducted on Standard RIS benchmarks demonstrate that the proposed method achieves favorable performance against state-of-the- art approaches. Moreover, we conducted experiments on different visual backbones respectively, and the proposed method yielded better and significantly improved performance results.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128793"},"PeriodicalIF":5.5,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-24DOI: 10.1016/j.neucom.2024.128730
Aiman Lameesa , Chaklam Silpasuwanchai , Md. Sakib Bin Alam
{"title":"VG-CALF: A vision-guided cross-attention and late-fusion network for radiology images in Medical Visual Question Answering","authors":"Aiman Lameesa , Chaklam Silpasuwanchai , Md. Sakib Bin Alam","doi":"10.1016/j.neucom.2024.128730","DOIUrl":"10.1016/j.neucom.2024.128730","url":null,"abstract":"<div><div>Image and question matching is essential in Medical Visual Question Answering (MVQA) in order to accurately assess the visual-semantic correspondence between an image and a question. However, the recent state-of-the-art methods focus solely on the contrastive learning between an entire image and a question. Though contrastive learning successfully model the global relationship between an image and a question, it is less effective to capture the fine-grained alignments conveyed between image regions and question words. In contrast, large-scale pre-training poses significant drawbacks, including extended training times, handling substantial data volumes, and necessitating high computational power. To address these challenges, we propose the Vision-Guided Cross-Attention based Late Fusion (VG-CALF) network, which integrates image and question features into a unified deep model without relying on pre-training for MVQA tasks. In our proposed approach, we use self-attention to effectively leverage intra-modal relationships within each modality and implement vision-guided cross-attention to emphasize the inter-modal relationships between image regions and question words. By simultaneously considering intra-modal and inter-modal relationships, our proposed method significantly improves the overall performance of MVQA without the need for pre-training on extensive image-question pairs. Experimental results on benchmark datasets, such as, SLAKE and VQA-RAD demonstrate that our proposed approach performs competitively with existing state-of-the-art methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128730"},"PeriodicalIF":5.5,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-24DOI: 10.1016/j.neucom.2024.128729
Hongguang Fan , Kaibo Shi , Yi Zhao
{"title":"Synchronization of nonlinear neural networks with hybrid couplings and uncertain time-varying perturbations: A novel distributed-delay impulsive comparison principle","authors":"Hongguang Fan , Kaibo Shi , Yi Zhao","doi":"10.1016/j.neucom.2024.128729","DOIUrl":"10.1016/j.neucom.2024.128729","url":null,"abstract":"<div><div>This paper investigates the synchronization of nonlinear drive-response neural networks subject to uncertain time-varying perturbations, non-delayed coupling, and distributed delay coupling. To address the influence of distributed and discrete delays on the system, we establish a novel impulsive comparison principle, extending the Halanay inequality. By leveraging Lyapunov stability theory, we derive sufficient conditions for the exponential synchronization of the neural networks using a delayed impulsive controller with historical status information. This approach relaxes the conventional constraint that impulsive delays must be smaller than impulsive intervals, thereby generalizing existing synchronization results for distributed delay networks. Numerical simulations for chaotic neural networks validate the theoretical results and demonstrate the sensitivity of the control gain matrix.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128729"},"PeriodicalIF":5.5,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-22DOI: 10.1016/j.neucom.2024.128691
Jian Dai , Hao Wu , Huan Liu , Liheng Yu , Xing Hu , Xiao Liu , Daoying Geng
{"title":"FedATA: Adaptive attention aggregation for federated self-supervised medical image segmentation","authors":"Jian Dai , Hao Wu , Huan Liu , Liheng Yu , Xing Hu , Xiao Liu , Daoying Geng","doi":"10.1016/j.neucom.2024.128691","DOIUrl":"10.1016/j.neucom.2024.128691","url":null,"abstract":"<div><div>Pre-trained on large-scale datasets has profoundly promoted the development of deep learning models in medical image analysis. For medical image segmentation, collecting a large number of labeled volumetric medical images from multiple institutions is an enormous challenge due to privacy concerns. Self-supervised learning with mask image modeling (MIM) can learn general representation without annotations. Integrating MIM into FL enables collaborative learning of an efficient pre-trained model from unlabeled data, followed by fine-tuning with limited annotations. However, setting pixels as reconstruction targets in traditional MIM fails to facilitate robust representation learning due to the medical image's complexity and distinct characteristics. On the other hand, the generalization of the aggregated model in FL is also impaired under the heterogeneous data distributions among institutions. To address these issues, we proposed a novel self-supervised federated learning, which combines masked self-distillation with adaptive attention federated learning. Such incorporation enjoys two vital benefits. First, masked self-distillation sets high-quality latent representations of masked tokens as the target, improving the descriptive capability of the learned presentation rather than reconstructing low-level pixels. Second, adaptive attention aggregation with Personalized federate learning effectively captures specific-related representation from the aggregated model, thus facilitating local fine-tuning performance for target tasks. We conducted comprehensive experiments on two medical segmentation tasks using a large-scale dataset consisting of volumetric medical images from multiple institutions, demonstrating superior performance compared to existing federated self-supervised learning approaches.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128691"},"PeriodicalIF":5.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-22DOI: 10.1016/j.neucom.2024.128724
Sisi Peng , Dan Qu , Wenlin Zhang , Hao Zhang , Shunhang Li , Minchen Xu
{"title":"Easy and effective! Data augmentation for knowledge-aware dialogue generation via multi-perspective sentences interaction","authors":"Sisi Peng , Dan Qu , Wenlin Zhang , Hao Zhang , Shunhang Li , Minchen Xu","doi":"10.1016/j.neucom.2024.128724","DOIUrl":"10.1016/j.neucom.2024.128724","url":null,"abstract":"<div><div>In recent years, knowledge-based dialogue generation has garnered significant attention due to its capacity to produce informative and coherent responses through the integration of external knowledge into models. However, obtaining high-quality knowledge that aligns with the dialogue content poses a considerable challenge, necessitating substantial time and resources. To tackle the issue of limited dialogue data, a majority of research endeavors concentrate on data augmentation to augment the volume of training data. Regrettably, these methods overlook knowledge augmentation, leading to a restricted diversity in input data and yielding enhancements solely in specific metrics. Real-world conversations exhibit a spectrum of characteristics, including repetitions, reversals, and interruptions, demanding a heightened level of data diversity. In this study, we introduce a straightforward yet effective data augmentation technique known as Multi-perspective Sentence Interaction to bolster the connections among sentences from varied viewpoints. Through an examination of target responses from multiple dialogue perspectives, we enhance our comprehension of the relationships between dialogue sentences, thereby facilitating the expansion of knowledge-based dialogue data. Through experiments conducted on various knowledge-based dialogue datasets and utilizing different models, our findings illustrate a notable enhancement in the quality of model generation facilitated by our method. Specifically, we observed a 3.5% enhancement in reply accuracy and a 0.1506 increase in diversity (DIST-2). Moreover, there was a substantial improvement in knowledge selection accuracy by 19.04% and a reduction in model perplexity by 31.48%.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128724"},"PeriodicalIF":5.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-22DOI: 10.1016/j.neucom.2024.128735
Chao Wang , Zihao Wang , Yang Zhou
{"title":"Answering, Fast and Slow: Strategy enhancement of visual understanding guided by causality","authors":"Chao Wang , Zihao Wang , Yang Zhou","doi":"10.1016/j.neucom.2024.128735","DOIUrl":"10.1016/j.neucom.2024.128735","url":null,"abstract":"<div><div>In his classic book <em>Thinking, Fast and Slow</em> (Daniel, 2017), Kahneman points out that human thinking can be categorized into two main modes of thinking: a system that displays intuition and emotion (i.e., System 1), and a system that is more planned and relies more on logic, defined as System 2. This idea explains both rational and irrational motivations. In this paper, we revisit visual comprehension tasks based on this idea. At the theoretical level, we focus on the relationship between intuitive thinking, prior knowledge, and environmental information, and build a causal graph between the three. Further, inspired by the constructed causal graph, an intuitive optimization strategy with clear interpretability is proposed. In the validation session, we provide conclusions consistent with the theoretical analyses through extensive experiments on public datasets based on a visual quizzing task. Excitingly, our scheme demonstrates strong competitiveness in terms of generalizability without adding new technologies.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128735"},"PeriodicalIF":5.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-level discriminator based contrastive learning for multiplex networks","authors":"Hongrun Wu , MingJie Zhang , Zhenglong Xiang , Yingpin Chen , Fei Yu , Xuewen Xia , Yuanxiang Li","doi":"10.1016/j.neucom.2024.128754","DOIUrl":"10.1016/j.neucom.2024.128754","url":null,"abstract":"<div><div>Graph embedding is a technique for obtaining low-dimensional representations of nodes across diverse networks, which may then be used for various downstream tasks and applications. When it applies to heterogeneous networks, it is hard to handle heterogeneous networks because they usually contain different types of nodes and edges with more semantic and structural information. Recently, contrastive learning has developed as the preferred strategy for dealing with unsupervised heterogeneous graph embedding to reduce the cost of human label annotation. However, most multi-view contrastive learning approaches calculate the model’s loss only based on the mutual dependence between the node representation and graph representation. These approaches ignore that both node attributes and node clustering contain discriminative content. To solve this issue, we propose a model called Multi-Level Discriminator-based Contrastive Learning for Multiplex Networks (MLDCL). This model adopts a multi-level multi-discriminator-based approach that can simultaneously learn the global-level structural information, node-level attribute information, and local-level clustering information. Moreover, an augmentation strategy in the contrast learning process from the spectral domain is proposed to improve the representation and discriminative ability of MLDCL. Numerous tests with node clustering and classification tasks on widely used datasets demonstrate the efficacy of the proposed approach.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128754"},"PeriodicalIF":5.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-22DOI: 10.1016/j.neucom.2024.128787
Manman Fei , Xin Zhang , Dongdong Chen , Zhiyun Song , Qian Wang , Lichi Zhang
{"title":"Whole slide cervical cancer classification via graph attention networks and contrastive learning","authors":"Manman Fei , Xin Zhang , Dongdong Chen , Zhiyun Song , Qian Wang , Lichi Zhang","doi":"10.1016/j.neucom.2024.128787","DOIUrl":"10.1016/j.neucom.2024.128787","url":null,"abstract":"<div><div>Cervical cancer is one of the most common cancers among women, which seriously threatens women’s health. Early screening can reduce the incidence rate and mortality. Thinprep cytologic test (TCT) is one of the important means of cytological screening, which has high sensitivity and specificity, and has been widely used in the early screening of cervical cancer. The automatic diagnosis of whole slide images (WSIs) by computers can effectively improve the efficiency and accuracy of doctors’ diagnoses. However, current methods ignore the intrinsic relationships between cervical cells in WSIs and neglect contextual information from the surrounding suspicious areas, and therefore limit their robustness and generalizability. In this paper, we propose a novel two-stage method to implement the automatic diagnosis of WSIs, which constructs Graph Attention Networks (GAT) based on local and global fields respectively to capture their contextual information in a hierarchical manner. In the first stage, we extract representative patches from each WSI through suspicious cell detection, and then employ a Local GAT to classify cervical cells by capturing correlations between suspicious cells in image tiles. This classification process provides the confidence and feature vectors for each suspicious cell. In the second stage, we perform WSI classification using a Global GAT model. We construct graphs corresponding to top-<span><math><msub><mrow><mi>K</mi></mrow><mrow><mi>g</mi></mrow></msub></math></span> and bottom-<span><math><msub><mrow><mi>K</mi></mrow><mrow><mi>g</mi></mrow></msub></math></span> cells for each WSI based on results from Local GAT, and introduce a supervised contrastive learning strategy to enhance the discriminative power of the extracted features. Experimental results demonstrate that our proposed method outperforms conventional approaches and effectively showcases the benefits of supervised contrastive learning. Our source code and example data are available at https://github.com/feimanman/Whole-Slide-Cervical-Cancer-Classification.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128787"},"PeriodicalIF":5.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-21DOI: 10.1016/j.neucom.2024.128723
Wenjie Yang , Pei Xu
{"title":"Learning differentiable categorical regions with Gumbel-Softmax for person re-identification","authors":"Wenjie Yang , Pei Xu","doi":"10.1016/j.neucom.2024.128723","DOIUrl":"10.1016/j.neucom.2024.128723","url":null,"abstract":"<div><div>Locating diverse body parts and perceiving part visibility are essential to person re-identification (re-ID). Most existing methods employ an extra model, <em>e.g.</em>, pose estimation or human parsing, to locate parts, or generate pseudo labels to train the part locator incorporated with the re-ID model. In this paper, we aim at learning diverse horizontal stripes with foreground refinement to pursue pixel-level part alignment via only using person identity labels. Specifically, we proposed a Gumbel-Softmax based Differential Categorical Region (DCR) learning method and make two contributions. (1) A stripe-wise regularization. Given an image, the part locator produce part probability maps. The continuous values in the probability maps are discretized into zero or <span><math><mrow><mi>arg</mi><mspace></mspace><mi>max</mi></mrow></math></span> value in the horizontal stripes by the Gumbel-Softmax. Gumbel-Softmax allows us to use the <span><math><mrow><mi>arg</mi><mspace></mspace><mi>max</mi></mrow></math></span> discrete value for part diversity regularization in the forward pass, but can still estimate gradients in the backward pass. (2) A self-refinement method to suppress the background noise in the stripes. We employ a lightweight foreground perception head to produce foreground probability map with only person identity labels supervision. Benefits from discretization of the categorical stripes, we can conveniently obtain the part pseudo label by element-wise multiplying the categorical stripes with foreground probability map. Finally, DCR can locate the body parts at pixel-level and extract part-aligned representation. Experimental results on both holistic and occluded re-ID datasets confirm that our approach significantly improves the learned representation and the achieved performance is on par with the state-of-the-art methods. The code is available at <span><span>https://github.com/deepalchemist/differentiable-categorical-region</span><svg><path></path></svg></span></div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128723"},"PeriodicalIF":5.5,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-21DOI: 10.1016/j.neucom.2024.128722
Zuohui Chen , Yao Lu , JinXuan Hu , Qi Xuan , Zhen Wang , Xiaoniu Yang
{"title":"Graph-Based Similarity of Deep Neural Networks","authors":"Zuohui Chen , Yao Lu , JinXuan Hu , Qi Xuan , Zhen Wang , Xiaoniu Yang","doi":"10.1016/j.neucom.2024.128722","DOIUrl":"10.1016/j.neucom.2024.128722","url":null,"abstract":"<div><div>Understanding the enigmatic black-box representations within Deep Neural Networks (DNNs) is an essential problem in the community of deep learning. An initial step towards tackling this conundrum lies in quantifying the degree of similarity between these representations. Various approaches have been proposed in prior research, however, as the field of representation similarity continues to develop, existing metrics are not compatible with each other and struggling to meet the evolving demands. To address this, we propose a comprehensive similarity measurement framework inspired by the natural graph structure formed by samples and their corresponding features within the neural network. Our novel Graph-Based Similarity (GBS) framework gauges the similarity of DNN representations by constructing a weighted, undirected graph based on the output of hidden layers. In this graph, each node represents an input sample, and the edges are weighted in accordance with the similarity between pairs of nodes. Consequently, the measure of representational similarity can be derived through graph similarity metrics, such as layer similarity. We observe that input samples belonging to the same category exhibit dense interconnections within the deep layers of the DNN. To quantify this phenomenon, we employ a motif-based approach to gauge the extent of these interconnections. This serves as a metric to evaluate whether the representation derived from one model can be accurately classified by another. Experimental results show that GBS gets state-of-the-art performance in the sanity check. We also extensively evaluate GBS on downstream tasks to demonstrate its effectiveness, including measuring the transferability of pretrained models and model pruning.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128722"},"PeriodicalIF":5.5,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}