NeurocomputingPub Date : 2024-10-19DOI: 10.1016/j.neucom.2024.128733
Kai Jiang , Peng Peng , Youzao Lian , Weihui Shao , Weisheng Xu
{"title":"Neighbor patches merging reduces spatial redundancy to accelerate vision transformer","authors":"Kai Jiang , Peng Peng , Youzao Lian , Weihui Shao , Weisheng Xu","doi":"10.1016/j.neucom.2024.128733","DOIUrl":"10.1016/j.neucom.2024.128733","url":null,"abstract":"<div><div>Vision Transformers (ViTs) deliver outstanding performance but often require substantial computational resources. Various token pruning methods have been developed to enhance throughput by removing redundant tokens; however, these methods do not address the peak memory consumption, which remains equivalent to that of the unpruned networks. In this study, we introduce Neighbor Patches Merging (NEPAM), a method that significantly reduces the maximum memory footprint of ViTs while pruning tokens. NEPAM targets spatial redundancy within images and prunes redundant patches at the onset of the model, thereby achieving the optimal throughput-accuracy trade-off without fine-tuning. Experimental results demonstrate that NEPAM can accelerate the inference speed of the Vit-Base-Patch16-384 model by 25% with a negligible accuracy loss of <strong>0.07%</strong> and a notable <strong>18%</strong> reduction in memory usage. When applied to VideoMAE, NEPAM doubles the throughput with a <strong>0.29%</strong> accuracy loss and a <strong>48%</strong> reduction in memory usage. These findings underscore the efficacy of NEPAM in mitigating computational requirements while maintaining model performance.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-18DOI: 10.1016/j.neucom.2024.128721
Lixin Liu , Peihang Xu , Kefeng Fan , Mingyan Wang
{"title":"Research on application of knowledge graph in industrial control system security situation awareness and decision-making: A survey","authors":"Lixin Liu , Peihang Xu , Kefeng Fan , Mingyan Wang","doi":"10.1016/j.neucom.2024.128721","DOIUrl":"10.1016/j.neucom.2024.128721","url":null,"abstract":"<div><div>Knowledge graph as a powerful tool for knowledge organization and representation, can integrate scattered data into a unified knowledge network, enabling knowledge association and knowledge reasoning, thus improving the accuracy of security situation awareness. In the context of industrial control system security decision-making, knowledge graphs can provide comprehensive knowledge support, assisting decision-makers in making intelligent decisions. This paper reviews the current research status of knowledge graphs in the field of industrial control system security situation awareness and decision-making, covering data-driven techniques, rule-based methods, and knowledge graph-based approaches. Existing knowledge graph technologies and their practical applications in industrial control system security situation awareness and decision-making are discussed. Finally, a series of challenges in the application of knowledge graphs in industrial control system security situation awareness and decision are summarized, such as data dispersion, correlation and information security, and the future prospects.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-18DOI: 10.1016/j.neucom.2024.128736
Hang Wang , Xiaoming Liu , Guan Yang , Sensen Dong , Xingang Hu , Jie Liu
{"title":"Causal-relationship representation enhanced joint extraction model for elements and relationships","authors":"Hang Wang , Xiaoming Liu , Guan Yang , Sensen Dong , Xingang Hu , Jie Liu","doi":"10.1016/j.neucom.2024.128736","DOIUrl":"10.1016/j.neucom.2024.128736","url":null,"abstract":"<div><div>The joint extraction of elements such as entities, relations, events, and their arguments, along with their specific interrelationships, is a critical task in natural language processing. Existing research often employs implicit handling of interactions between tasks through techniques like shared encodings or parameter sharing, lacking explicit modeling of the specific relationships between tasks. This limitation hinders the full utilization of inter-task correlation information and impacts effective collaboration between tasks. To address this, we propose a model for joint extraction of elements and relations based on causal relationship representation enhancement (CRE). This model aims to capture specific relationships between tasks in multiple stages, facilitating finer adjustments and optimization of subtasks and thereby enhancing the overall model performance. Specifically, CRE comprises three key modules: feature adaptation, feature interaction, and feature fusion. The feature adaptation module selects and adjusts features from shared encodings based on the requirements of specific tasks to better adapt to semantic differences between different tasks. The feature interaction module employs causal reasoning to comprehensively capture the causal relationships between tasks while mitigating negative transfer brought about by interference from semantic information. The feature fusion module further integrates features to obtain optimized task-specific representations. Ultimately, CRE exhibits a significant improvement in average performance across multiple information extraction tasks.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-18DOI: 10.1016/j.neucom.2024.128741
Lifang Wang , Yali Wang , Wenjing Ren , Jing Yu , Xiaoyan Chang , Xiaodong Guo , Lihua Hu
{"title":"A dual encoder LDCT image denoising model based on cross-scale skip connections","authors":"Lifang Wang , Yali Wang , Wenjing Ren , Jing Yu , Xiaoyan Chang , Xiaodong Guo , Lihua Hu","doi":"10.1016/j.neucom.2024.128741","DOIUrl":"10.1016/j.neucom.2024.128741","url":null,"abstract":"<div><div>LDCT image denoising is crucial in medical imaging as it aims to minimize patient radiation exposure while maintaining diagnostic image quality. However, current convolutional neural network-based denoising methods struggle to incorporate global contexts, often focusing solely on local features. This limitation poses a significant challenge. To address this, a dual encoder denoising model is introduced that utilizes the Transformer model’s proficiency in capturing long-range dependencies and global context. This model integrates the Transformer branch and the convolutional branch in the encoder. By concatenating the features of these two different branches, the model can capture both global and local image features, substantially enhancing denoising efficacy. A cross-scale skip connection mechanism is introduced to integrate the encoder’ s low-level features with the decoder’ s high-level features, enriching contextual information and preserving image details. In addition, to meet the requirements of multi-scale feature fusion, the decoder is equipped with different multi-scale convolution modules to optimize feature processing. The number of layers in these modules gradually decreases as the depth of the decoder increases. In order to enhance the discriminative ability of the model, a multi-scale discriminator is also introduced, which effectively improves the recognition ability of the image by extracting features from four different scales. Consequently, our approach demonstrates remarkable performance in reducing noise and improving LDCT image quality, as evidenced by the substantial improvements in PSNR (17.75%) and SSIM (7.31%) values.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-18DOI: 10.1016/j.neucom.2024.128727
Xun Jin , Bingkui Sun , De Li
{"title":"Audio feature enhancement based on quaternion filtering and deep hashing","authors":"Xun Jin , Bingkui Sun , De Li","doi":"10.1016/j.neucom.2024.128727","DOIUrl":"10.1016/j.neucom.2024.128727","url":null,"abstract":"<div><div>This paper aims to solve the problems of difficult convergence of audio model training, large data demand, and large dimensionality of storage space for audio-generated feature vectors. To this end, this paper proposes the use of quaternion Gabor filtering to suppress the background information of the spectrogram and reduce the interference of the data for the case of insufficient data alignment between audio data and image data after shifting the domain. In addition, different scales of window lengths and frame shifts are used to capture the connections between different vocal objects. To address the problem that the generated feature vectors are large dimensional, we use a deep hash module to map high-dimensional features to low-dimensional features and use a probability function to make the learned samples more consistent with the overall distribution. In the experimental evaluation, the proposed method was evaluated on the environmental sound classification dataset and the music genre classification dataset. The proposed method uses only a common backbone network and improves the accuracy of audio recognition.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-18DOI: 10.1016/j.neucom.2024.128749
Guanyuan Chen , Ningbo Zhu , Jianxin Lin , Bin Pu , Hongxia Luo , Kenli Li
{"title":"ThyFusion: A lightweight attribute enhancement module for thyroid nodule diagnosis using gradient and frequency-domain awareness","authors":"Guanyuan Chen , Ningbo Zhu , Jianxin Lin , Bin Pu , Hongxia Luo , Kenli Li","doi":"10.1016/j.neucom.2024.128749","DOIUrl":"10.1016/j.neucom.2024.128749","url":null,"abstract":"<div><div>Accurate classification of thyroid nodules is a critical step in clinical diagnosis and plays a crucial role in guiding subsequent treatment planning. However, current deep learning methods lack the ability to effectively perceive the attributes of thyroid nodules. Specifically, convolutional neural network-based methods struggle to capture fine-grained attributes, such as echogenic foci, due to repeated downsampling operations. On the other hand, Transformer-based methods partition the image into blocks, which hinders the capture of continuous attributes such as margins, and require a large number of parameters. To overcome these limitations, this paper proposes a novel lightweight attribute enhancement module called ThyFusion. Firstly, a multi-scale Gaussian filtering attention module is developed to accurately capture fine-grained information. Secondly, a frequency-domain global regulation module is proposed to regulate global features and capture coarse-grained attributes. Finally, a cross domain mutual learning module is presented to fuse fine-grained and coarse-grained attributes. Extensive experiments were conducted to demonstrate the effectiveness of ThyFusion, and it was compared against 15 benchmark methods. ThyFusion achieves higher accuracy, F1 score, recall, and precision (87.18%, 87.12%, 87.36% and 87.10%). ThyFusion is a plug-and-play lightweight module that can be embedded behind any layer of a backbone network to enhance the accuracy of thyroid nodule classification. The advantages of ThyFusion in thyroid nodule classification can greatly streamline the process of automated diagnosis and improve the accuracy of thyroid diagnosis.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-18DOI: 10.1016/j.neucom.2024.128747
Dingfu Chen , Kangwei Lin , Qingxu Deng
{"title":"UCC: A unified cascade compression framework for vision transformer models","authors":"Dingfu Chen , Kangwei Lin , Qingxu Deng","doi":"10.1016/j.neucom.2024.128747","DOIUrl":"10.1016/j.neucom.2024.128747","url":null,"abstract":"<div><div>In recent years, Vision Transformer (ViT) and its variants have dominated many computer vision tasks. However, the high computational consumption and training data requirements of ViT make it challenging to be deployed directly on resource-constrained devices and environments. Model compression is an effective approach to accelerate deep learning networks, but existing methods for compressing ViT models are limited in their scopes and struggle to strike a balance between performance and computational cost. In this paper, we propose a novel Unified Cascaded Compression Framework (UCC) to compress ViT in a more precise and efficient manner. Specifically, we first analyze the frequency information within tokens and prune them based on a joint score of their both spatial and spectral characteristics. Subsequently, we propose a similarity-based token aggregation scheme that combines the abundant contextual information contained in all pruned tokens with the host tokens according to their weights. Additionally, we introduce a novel cumulative cascaded pruning strategy that performs bottom-up cascaded pruning of tokens based on cumulative scores, avoiding information loss caused by individual idiosyncrasies of blocks. Finally, we design a novel two-level distillation strategy, incorporating imitation and exploration, to ensure the diversity of knowledge and better performance recovery. Extensive experiments demonstrate that our unified cascaded compression framework outperforms most existing state-of-the-art approaches, compresses the floating-point operations of ViT-Base as well as DeiT-Base models by 22 % and 54.1 %, and improves the recognition accuracy of the models by 3.74 % and 1.89 %, respectively, significantly reducing model computational consumption while enhancing performance, which enables efficient end-to-end training of compact ViT models.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain adaptation for semantic segmentation of road scenes via two-stage alignment of traffic elements","authors":"Yuan Gao, Yaochen Li, Hao Liao, Tenweng Zhang, Chao Qiu","doi":"10.1016/j.neucom.2024.128744","DOIUrl":"10.1016/j.neucom.2024.128744","url":null,"abstract":"<div><div>Unsupervised domain adaptation has been used to reduce the domain shift, which would improve the performance of semantic segmentation on unlabeled real-world data. However, existing methodologies fall short in effectively addressing the domain shift issue prevalent in traffic scenarios, leading to less than satisfactory segmentation results. In this paper, we propose a novel domain adaptation method for semantic segmentation via unsupervised alignment of traffic elements. Firstly, we introduce a two-stage self-training framework that leverages a blended set of training samples to enhance the training process. In the first stage, we leverage generated mixup training samples as inputs within our two-stage self-training framework and have developed corresponding loss functions for both the source and target domains to direct the training process. Then, the alignment modules for dynamic and static traffic elements are designed to achieve accurate matching between the source and the target domain images. The cosine similarity maximization is applied to the alignment of dynamic traffic elements, while the prototype learning is utilized for the static traffic elements. Additionally, we present a new technique for reducing noise in pseudo labels by constructing thresholds that adjust to each class. Meanwhile, we formulate the associated target domain loss function for vacant pseudo label pixels. The experimental results demonstrate that the proposed method is superior to the existing methods on five different domain adaptation tasks, which is more applicable to semantic segmentation of road scenes.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-17DOI: 10.1016/j.neucom.2024.128686
Abdul Joseph Fofanah , Alpha Omar Leigh
{"title":"EATSA-GNN: Edge-Aware and Two-Stage attention for enhancing graph neural networks based on teacher–student mechanisms for graph node classification","authors":"Abdul Joseph Fofanah , Alpha Omar Leigh","doi":"10.1016/j.neucom.2024.128686","DOIUrl":"10.1016/j.neucom.2024.128686","url":null,"abstract":"<div><div>Graph Neural Networks (GNNs) have fundamentally transformed the way in which we handle and examine data originating from non-Euclidean domains. Traditional approaches to imbalanced node classification problems, such as resampling, are ineffective because they do not take into account the underlying network structure of the edges. The limited methods available to capture the intricate connections encoded in the edges of a graph pose a significant challenge for GNNs in accurately classifying nodes. We propose EATSA-GNN model to enhance GNN node classification using Edge-Aware and Two-Stage Attention Mechanisms (EATSA-GNN). EATSA-GNN focuses its initial attention on edge traits, enabling the model to differentiate the variable significance of different connections between nodes, referred to as Teacher-Attention (TA). In the second step, attention is directed towards the nodes, incorporating the knowledge obtained from the edge-level analysis referred to as Student-Attention (SA). Employing this dual strategy ensures a more sophisticated comprehension of the graph’s framework, resulting in improved classification precision. The EATSA-GNN model’s contribution to the field of GNNs lies in its ability to utilise both node and edge information in a cohesive manner, resulting in more accurate node classifications. This highlights the essence of the model and its potential. Comparing the EATSA-GNN model to state-of-the-arts methods with two different variants shows how strong it is and how well it can handle complex problems for node classification. This solidifies its position as one of leading solution in the field of GNN architectures and their use in complex networked systems. The exceptional performance of EATSA-GNN not only showcases its effectiveness but also underscores its potential to greatly influence the future advancement of the GNN framework. Implementation of the proposed EATSA-GNN can be accessed here <span><span>https://github.com/afofanah/EATSA-GNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Artificial immunofluorescence in a flash: Rapid synthetic imaging from brightfield through residual diffusion","authors":"Xiaodan Xing , Chunling Tang , Siofra Murdoch , Giorgos Papanastasiou , Yunzhe Guo , Xianglu Xiao , Jan Cross-Zamirski , Carola-Bibiane Schönlieb , Kristina Xiao Liang , Zhangming Niu , Evandro Fei Fang , Yinhai Wang , Guang Yang","doi":"10.1016/j.neucom.2024.128715","DOIUrl":"10.1016/j.neucom.2024.128715","url":null,"abstract":"<div><div>Immunofluorescent (IF) imaging is crucial for visualising biomarker expressions, cell morphology and assessing the effects of drug treatments on sub-cellular components. IF imaging needs extra staining process and often requiring cell fixation, therefore it may also introduce artefacts and alter endogenous cell morphology. Some IF stains are expensive or not readily available hence hindering experiments. Recent diffusion models, which synthesise high-fidelity IF images from easy-to-acquire brightfield (BF) images, offer a promising solution but are hindered by training instability and slow inference times due to the noise diffusion process. This paper presents a novel method for the conditional synthesis of IF images directly from BF images along with cell segmentation masks. Our approach employs a Residual Diffusion process that enhances stability and significantly reduces inference time. We performed a critical evaluation against other image-to-image synthesis models, including UNets, GANs, and advanced diffusion models. Our model demonstrates significant improvements in image quality (<span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>05</mn></mrow></math></span> in MSE, PSNR, and SSIM), inference speed (26 times faster than competing diffusion models), and accurate segmentation results for both nuclei and cell bodies (0.77 and 0.63 mean IOU for nuclei and cell true positives, respectively). This paper is a substantial advancement in the field, providing robust and efficient tools for cell image analysis.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}