NeurocomputingPub Date : 2024-11-07DOI: 10.1016/j.neucom.2024.128850
Abdulganiyu Abdu Yusuf , Chong Feng , Xianling Mao , Yunusa Haruna , Xinyan Li , Ramadhani Ally Duma
{"title":"Graph-enhanced visual representations and question-guided dual attention for visual question answering","authors":"Abdulganiyu Abdu Yusuf , Chong Feng , Xianling Mao , Yunusa Haruna , Xinyan Li , Ramadhani Ally Duma","doi":"10.1016/j.neucom.2024.128850","DOIUrl":"10.1016/j.neucom.2024.128850","url":null,"abstract":"<div><div>Visual Question Answering (VQA) has witnessed significant advancements recently, due to the application of deep learning in the field of vision-language research. Most current VQA models focus on merging visual and text features, but it is essential for these models to also consider the relationships between different parts of an image and use question information to highlight important features. This study proposes a method to enhance neighboring image region features and learn question-aware visual representations. First, we construct a region graph to represent spatial relationships between objects in the image. Then, graph convolutional network (GCN) is used to propagate information across neighboring regions, enriching each region’s feature representation by integrating contextual information. To capture long-range dependencies, the graph is enhanced with random walk with restart (RWR), enabling multi-hop reasoning across distant regions. Furthermore, a question-aware dual attention mechanism is introduced to further refine region features at both region and feature levels, ensuring that the model emphasizes key regions that are critical for answering the question. The enhanced region representations are then combined with the encoded question to predict an answer. Through extensive experiments on VQA benchmarks, the study demonstrates state-of-the-art performance by leveraging regional dependencies and question guidance. The integration of GCNs and random walks in the graph helps capture contextual information to focus visual attention selectively, resulting in significant improvements over existing methods on VQA 1.0 and VQA 2.0 benchmark datasets.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128850"},"PeriodicalIF":5.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-11-07DOI: 10.1016/j.neucom.2024.128889
Hao Song , Qingshan She , Feng Fang , Su Liu , Yun Chen , Yingchun Zhang
{"title":"Domain generalization through latent distribution exploration for motor imagery EEG classification","authors":"Hao Song , Qingshan She , Feng Fang , Su Liu , Yun Chen , Yingchun Zhang","doi":"10.1016/j.neucom.2024.128889","DOIUrl":"10.1016/j.neucom.2024.128889","url":null,"abstract":"<div><div>Electroencephalography (EEG)-based Motor Imagery (MI) brain-computer interface (BCI) systems play essential roles in motor function rehabilitation for patients with post-stroke. Existing neural networks for decoding MI EEG face challenges due to nonstationary characteristics and subject-specific variations of EEG data. To address these challenges and improve generalization performance, this study proposes a domain generalization (DG) model that eliminates the need for user-specific calibration in real-life applications. Specifically, the proposed model comprises two branches: the first branch applies several independent decision-making networks to decode and classify subjects’ motor intentions, while the second branch adaptively assigns weights to classification results and fuses them into a comprehensive decision. Both branches utilize EEGNet and ShallowConvNet to extract time-frequency-spatial features. By implementing multiple classification networks, the model can learn a broad range of data distributions from source subjects, which contributes to improved generalization performance on target subjects. The proposed EEG-DG framework was evaluated on BCI Competition IV Dataset 2a, 2b and PhysioNet. Results show that the proposed framework significantly enhances the classification performance of MI EEG, outperforming several state-of-the-art models on all three datasets, underlining its superior efficacy in real-world scenarios and exceptional generalization performance. The source code can be accessed at <span><span>https://github.com/DrugLover/Multibranch-DG-EEG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128889"},"PeriodicalIF":5.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-11-07DOI: 10.1016/j.neucom.2024.128728
Yeongjoon Kim , Sunkyu Kwon , Donggoo Kang , Hyunmin Lee , Joonki Paik
{"title":"Enhancing video frame interpolation with region of motion loss and self-attention mechanisms: A dual approach to address large, nonlinear motions","authors":"Yeongjoon Kim , Sunkyu Kwon , Donggoo Kang , Hyunmin Lee , Joonki Paik","doi":"10.1016/j.neucom.2024.128728","DOIUrl":"10.1016/j.neucom.2024.128728","url":null,"abstract":"<div><div>Video frame interpolation is particularly challenging when dealing with large and non-linear object motions, often resulting in poor frame quality and motion artifacts. In this study, we introduce a novel dual-approach methodology for video frame interpolation that effectively addresses these complexities. Our method consists of two key components: a Region of Motion (RoM) loss and self-attention mechanisms. The RoM loss is designed to spotlight significant movements within frames. This is achieved by employing feature-matching techniques that assign tailored weights during the training process, ensuring that areas of intense motion are given priority. This is facilitated by the computation of optical flow, which identifies crucial feature points and highlights regions of significant motion for targeted enhancement. Our method incorporates self-attention mechanisms to maintain inter-frame continuity while emphasizing the unique attributes of individual frames. The self-attention scores reduce motion discrepancies and enhance the distinctiveness and texture quality of each frame. We validate the efficacy of our approach through extensive evaluations on benchmark datasets, including Vimeo-90K, Middlebury, UCF101, and SNU-Film.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128728"},"PeriodicalIF":5.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-11-07DOI: 10.1016/j.neucom.2024.128821
Feng Cheng , Gaoliang Peng , Junbao Li , Benqi Zhao , Jeng-Shyang Pan , Hang Li
{"title":"A Transformer-based network with adaptive spatial prior for visual tracking","authors":"Feng Cheng , Gaoliang Peng , Junbao Li , Benqi Zhao , Jeng-Shyang Pan , Hang Li","doi":"10.1016/j.neucom.2024.128821","DOIUrl":"10.1016/j.neucom.2024.128821","url":null,"abstract":"<div><div>Single object tracking (SOT) in complex scenes presents significant challenges in computer vision. In recent years, transformer has shown its demonstrated efficacy in visual object tracking tasks, due to its capacity to capture the long-range dependencies between image pixels. However, two limitations hinder the performance improvement of transformer-based trackers. Firstly, transformer splits and partitions the image into a sequence of patches, which disrupts the internal structural information of the object. Secondly, transformer-based trackers encode the target template and search region together, potentially leading to confusion between the target and background during feature interaction. To address the above issues, we propose a fully transformer-based tracking framework via learning structural prior information, called SPformer. In other words, a self-attention spatial-prior generative network is established for simulating the spatial associations between features. Moreover, the cross-attention structural prior extractors based on Gaussian and arbitrary distributions are developed to seek the semantic interaction features between the object template and the search region, effectively mitigating feature confusion. Extensive experiments on eight prevailing benchmarks demonstrate that SPformer outperforms existing state-of-art (SOAT) trackers. We further analyze the effectiveness of the two proposed prior modules and validate their application in target tracking models.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128821"},"PeriodicalIF":5.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-11-07DOI: 10.1016/j.neucom.2024.128752
Xi Li , Zhen Xiang , David J. Miller , George Kesidis
{"title":"Correcting the distribution of batch normalization signals for Trojan mitigation","authors":"Xi Li , Zhen Xiang , David J. Miller , George Kesidis","doi":"10.1016/j.neucom.2024.128752","DOIUrl":"10.1016/j.neucom.2024.128752","url":null,"abstract":"<div><div>Backdoor (Trojan) attacks represent a significant adversarial threat to deep neural networks (DNNs). In such attacks, the presence of an attacker’s backdoor trigger causes a test instance to be misclassified into the attacker’s chosen target class. Post-training mitigation methods aim to rectify these misclassifications, ensuring that poisoned models correctly classify backdoor-triggered samples. These methods require the defender to have access to a small, clean dataset and the potentially compromised DNN. However, most defenses rely on parameter fine-tuning, making their effectiveness dependent on the dataset size available to the defender. To overcome the limitations of existing approaches, we propose a method that rectifies misclassifications by correcting the altered distribution of internal layer activations of backdoor-triggered instances. Distribution alterations are corrected by applying simple transformations to internal activations. Notably, our method does not modify any trainable parameters of the DNN, yet it achieves generally good mitigation performance against various backdoor attacks and benchmarks. Consequently, our approach demonstrates robustness even with a limited amount of clean data, making it highly practical for real-world applications. The effectiveness of our approach is validated through both theoretical analysis and extensive experimentation. The appendix is provided as an electronic component and can be accessed via the link in the footnote.<span><span><sup>2</sup></span></span> The source codes can be found in the link<span><span><sup>3</sup></span></span> at the footnote.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128752"},"PeriodicalIF":5.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-11-06DOI: 10.1016/j.neucom.2024.128820
Jun Nie , Guihua Zhang , Xiao Lu , Haixia Wang , Chunyang Sheng , Lijie Sun
{"title":"Reinforcement learning method based on sample regularization and adaptive learning rate for AGV path planning","authors":"Jun Nie , Guihua Zhang , Xiao Lu , Haixia Wang , Chunyang Sheng , Lijie Sun","doi":"10.1016/j.neucom.2024.128820","DOIUrl":"10.1016/j.neucom.2024.128820","url":null,"abstract":"<div><div>This paper proposes the proximal policy optimization (PPO) method based on sample regularization (SR) and adaptive learning rate (ALR) to address the issues of limited exploration ability and slow convergence speed in Autonomous Guided Vehicle (AGV) path planning using reinforcement learning algorithms in dynamic environments. Firstly, the regularization term based on empirical samples is designed to solve the bias and imbalance issues of training samples, and the sample regularization is added to the objective function to improve the policy selectivity of the PPO algorithm, thereby increasing the AGV’s exploration ability during the training process in the working environment. Secondly, the Fisher information matrix of the Kullback-Leibler (KL) divergence approximation and the KL divergence constraint term are exploited to design the policy update mechanism based on the dynamically adjustable adaptive learning rate throughout training. The method considers the geometric structure of the parameter space and the change of the policy gradient, aiming to optimize parameter update direction and enhance convergence speed and stability of the algorithm. Finally, the AGV path planning scheme based on reinforcement learning is established for simulation verification and comparations in two-dimensional raster map and Gazebo 3D simulation environment. Simulation results verify the feasibility and superiority of the proposed method applied to the AGV path planning problem.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128820"},"PeriodicalIF":5.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-11-06DOI: 10.1016/j.neucom.2024.128781
Xian Mo , Zihang Zhao , Xiaoru He , Hang Qi , Hao Liu
{"title":"Intelligible graph contrastive learning with attention-aware for recommendation","authors":"Xian Mo , Zihang Zhao , Xiaoru He , Hang Qi , Hao Liu","doi":"10.1016/j.neucom.2024.128781","DOIUrl":"10.1016/j.neucom.2024.128781","url":null,"abstract":"<div><div>Recommender systems are an important tool for information retrieval, which can aid in the solution of the issue of information overload. Recently, contrastive learning has shown remarkable performance in recommendation by data augmentation processes to address highly sparse data. Our paper proposes an <u>Int</u>elligible <u>G</u>raph <u>C</u>ontrastive <u>L</u>earning with attention-aware (IntGCL) for recommendation. Particularly, our IntGCL first introduces a novel attention-aware matrix into graph convolutional networks (GCN) to identify the importance between users and items, which is constructed to preserve the importance between users and items by a random walk with a restart strategy and can enhance the intelligibility of our model. Then, the attention-aware matrix is further utilised to guide the generation of a graph-generative model with attention-aware and a graph-denoising model for automatically generating two trainable contrastive views for data augmentation, which can de-noise and further enhance the intelligibility. Comprehensive experiments on four real-world datasets indicate the superiority of our IntGCL approach over multiple state-of-the-art methods. Our datasets and source code are available at <span><span>https://github.com/restarthxr/InpGCL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128781"},"PeriodicalIF":5.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-11-06DOI: 10.1016/j.neucom.2024.128768
Xiaocheng Luo , Yanping Chen , Ruixue Tang , Caiwei Yang , Ruizhang Huang , Yongbin Qin
{"title":"A bi-consolidating model for joint relational triple extraction","authors":"Xiaocheng Luo , Yanping Chen , Ruixue Tang , Caiwei Yang , Ruizhang Huang , Yongbin Qin","doi":"10.1016/j.neucom.2024.128768","DOIUrl":"10.1016/j.neucom.2024.128768","url":null,"abstract":"<div><div>Current methods to extract relational triples directly make a prediction based on a possible entity pair in a raw sentence without depending on entity recognition. The task suffers from a serious semantic overlapping problem, in which several relation triples may share one or two entities in a sentence. In this paper, based on a two-dimensional sentence representation, a bi-consolidating model is proposed to address this problem by simultaneously reinforcing the local and global semantic features relevant to a relation triple. This model consists of a local consolidation component and a global consolidation component. The first component uses a pixel difference convolution to enhance semantic information of a possible triple representation from adjacent regions and mitigate noise in neighboring neighbors. The second component strengthens the triple representation based a channel attention and a spatial attention, which has the advantage to learn remote semantic dependencies in a sentence. They are helpful to improve the performance of both entity identification and relation type classification in relation triple extraction. After evaluated on several publish datasets, the bi-consolidating model achieves competitive performance. Analytical experiments demonstrate the effectiveness of our model for relational triple extraction and give motivation for other natural language processing tasks.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128768"},"PeriodicalIF":5.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-11-06DOI: 10.1016/j.neucom.2024.128771
Miguel Abreu , Luís Paulo Reis , Nuno Lau
{"title":"Addressing imperfect symmetry: A novel symmetry-learning actor-critic extension","authors":"Miguel Abreu , Luís Paulo Reis , Nuno Lau","doi":"10.1016/j.neucom.2024.128771","DOIUrl":"10.1016/j.neucom.2024.128771","url":null,"abstract":"<div><div>Symmetry, a fundamental concept to understand our environment, often oversimplifies reality from a mathematical perspective. Humans are a prime example, deviating from perfect symmetry in terms of appearance and cognitive biases (e.g. having a dominant hand). Nevertheless, our brain can easily overcome these imperfections and efficiently adapt to symmetrical tasks. The driving motivation behind this work lies in capturing this ability through reinforcement learning. To this end, we introduce Adaptive Symmetry Learning (ASL) — a model-minimization actor-critic extension that addresses incomplete or inexact symmetry descriptions by adapting itself during the learning process. ASL consists of a symmetry fitting component and a modular loss function that enforces a common symmetric relation across all states while adapting to the learned policy. The performance of ASL is compared to existing symmetry-enhanced methods in a case study involving a four-legged ant model for multidirectional locomotion tasks. The results show that ASL can recover from large perturbations and generalize knowledge to hidden symmetric states. It achieves comparable or better performance than alternative methods in most scenarios, making it a valuable approach for leveraging model symmetry while compensating for inherent perturbations.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128771"},"PeriodicalIF":5.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-11-05DOI: 10.1016/j.neucom.2024.128753
Yingjie Song , Li Yang , Wenming Luo , Xiong Xiao , Zhuo Tang
{"title":"Boosting multi-document summarization with hierarchical graph convolutional networks","authors":"Yingjie Song , Li Yang , Wenming Luo , Xiong Xiao , Zhuo Tang","doi":"10.1016/j.neucom.2024.128753","DOIUrl":"10.1016/j.neucom.2024.128753","url":null,"abstract":"<div><div>The input of the multi-document summarization task is usually long and has high redundancy. Encoding multiple documents is a challenge for the Seq2Seq architecture. The way of concatenating multiple documents into a sequence ignores the relation between documents. Attention-based Seq2Seq architectures have slightly improved the cross-document relation modeling for multi-document summarization. However, these methods ignore the relation between sentences, and there is little improvement that can be achieved through the attention mechanism alone. This paper proposes a hierarchical approach to leveraging the relation between words, sentences, and documents for abstractive multi-document summarization. Our model employs the Graph Convolutional Networks (GCN) for capturing the cross-document and cross-sentence relations. The GCN module can enrich semantic representations by generating high-level hidden features. Our model achieves significant improvement over the attention-based baseline, beating the Hierarchical Transformer by 3.4/1.64, 1.92/1.44 ROUGE-1/2 F1 points on the Multi-News and WikiSum datasets, respectively. Experimental results demonstrate that our delivered method brings substantial improvements over several strong baselines.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128753"},"PeriodicalIF":5.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}