NeurocomputingPub Date : 2025-06-11DOI: 10.1016/j.neucom.2025.130587
Liao Bingli, Danilo Vasconcellos Vargas
{"title":"Beyond KV caching: Shared attention for efficient LLMs","authors":"Liao Bingli, Danilo Vasconcellos Vargas","doi":"10.1016/j.neucom.2025.130587","DOIUrl":"10.1016/j.neucom.2025.130587","url":null,"abstract":"<div><div>The rapid scaling of Large Language Models (LLMs) necessitates advancements in computational and memory efficiency during inference. While methods like Multi-Query Attention (MQA), Grouped-Query Attention (GQA), and Cross-Layer Attention (CLA) reduce Key–Value (KV) cache size by sharing K/V pairs, strategies to further reduce computational load include reusing computed attention weights across layers, an idea explored previously (e.g., LazyFormer (Ying et al., 2021)). This paper provides an extensive empirical investigation into the phenomenon of attention weight isotropy—high similarity in attention distributions across layers—within diverse modern LLMs (7B-72B scale). We demonstrate how this isotropy develops during pretraining, offering a fundamental insight into LLM attention dynamics. Leveraging these findings, we systematically evaluate and validate a cross-layer weight sharing technique, termed Shared Attention (SA). SA selectively reuses computed attention weights in layer spans identified as isotropic through our analysis. Our experiments across multiple benchmarks show that strategically applied SA maintains comparable performance to baseline models, particularly in later layers where isotropy is pronounced, while significantly reducing computational FLOPs and key cache requirements associated with attention calculation. This work provides principled guidance for optimizing attention mechanisms based on empirically observed layer dynamics in contemporary LLMs. Code and resources are available at <span><span>https://github.com/metacarbon/shareAtt</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130587"},"PeriodicalIF":5.5,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-06-11DOI: 10.1016/j.neucom.2025.130622
Wenjing Li , Xue Li , Ziyang Li , Jiong Yu , Pengcheng Chen
{"title":"HI-Captioner: End-to-end image captioning based on hierarchical multi-scale encoding and cross-modal interactive decoding","authors":"Wenjing Li , Xue Li , Ziyang Li , Jiong Yu , Pengcheng Chen","doi":"10.1016/j.neucom.2025.130622","DOIUrl":"10.1016/j.neucom.2025.130622","url":null,"abstract":"<div><div>The image captioning task aims to accurately describe images using natural language. It is widely applied in assisting visually impaired individuals, enhancing human–computer interaction, and image retrieval. However, existing end-to-end image captioning methods typically rely on single-scale feature extraction and use simplistic cross-modal interactions. This restricts the model’s capacity to capture multi-level information and complex scenes, leading to captions that lack detailed expression and exhibit poor semantic coherence in context. In response, this paper proposes an innovative image captioning model HI-Captioner based on Swin Transformer to capture multi-scale features, hierarchical positional information and optimize inter-modal information interaction. HI-Captioner integrates Hierarchical Synergistic Attention Module (HSAM) and Cascading Cross-Modal Interaction Decoder (CCMID). The former combines a hierarchical multi-scale attention mechanism with a hierarchical positional encoding method during the encoding stage, enabling more effective capture of multi-scale features and multi-level positional information of images. This improves the model’s ability to capture intricate nuances and comprehend global semantics in the process of generating captions. The latter adopts a novel cross-modal interaction approach to further enhance information exchange between image and text modalities during the decoding stage, significantly improving the model’s efficiency in integrating and representing information from different modalities. Experimental results demonstrate that our proposed model yields substantial performance enhancements on the MSCOCO and Flickr datasets, particularly excelling in evaluation metrics such as BLEU4 and CIDEr. This demonstrates the great potential of the method in generating accurate and natural image captions. The code will be released at: <span><span>https://github.com/Lwj-cap/HI_Captioner_main</span><svg><path></path></svg></span></div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"647 ","pages":"Article 130622"},"PeriodicalIF":5.5,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-06-11DOI: 10.1016/j.neucom.2025.130616
Ce Shan, Lulu Guo, Hong Chen
{"title":"Knowledge guided controllable diffusion for enhanced autonomous driving scenarios generation","authors":"Ce Shan, Lulu Guo, Hong Chen","doi":"10.1016/j.neucom.2025.130616","DOIUrl":"10.1016/j.neucom.2025.130616","url":null,"abstract":"<div><div>Considering the shortcomings of the natural driving datasets composed by typical scenarios in the verification and evaluation of autonomous driving, how to improve the diversity, realism and controllability to satisfy the complex driving scenarios need is an urgent problem to be solved. Therefore, a knowledge-guided controllable diffusion (KGCD) scenarios generation framework is proposed, which allows for the generation behaviors that conform to expected attributes through a differentiable cost designed based on prior driving knowledge during the inference stage. Specifically, factorized attention is utilized for capturing and fusion of the spatio-temporal interaction features for the scene embeddings, based on which the accurate scene-level interactive joint motion prediction is achieved. In the inference stage, diversified guidance is designed based on prior knowledge and embedded into a generation framework to guide the generation of controllable and realistic driving scenarios, filling the gap in the existing scenario library. In response to the slow inference speed of typical diffusion model, a learnable motion pattern estimator (MPE) module is proposed to improve inference speed while ensuring generation quality. Based on the experiments with different datasets, KGCD proposed in this paper can satisfy the requirements of diverse scene generation. Compared with the best performance obtained by other baseline, the realism, controllability and stability metrics have been improved on average by 3.57%, 7.66% and 7.71% respectively evaluated by nuScenes dataset.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130616"},"PeriodicalIF":5.5,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144314053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-06-10DOI: 10.1016/j.neucom.2025.130494
Dongjun Hwang , Seong Joon Oh , Junsuk Choe
{"title":"Small object matters in weakly supervised object localization","authors":"Dongjun Hwang , Seong Joon Oh , Junsuk Choe","doi":"10.1016/j.neucom.2025.130494","DOIUrl":"10.1016/j.neucom.2025.130494","url":null,"abstract":"<div><div>Weakly-supervised object localization (WSOL) methods aim to capture the extent of the target object without full supervision such as bounding boxes or segmentation masks. Although numerous studies have been conducted in the research field of WSOL, we find that most existing methods are less effective at localizing small objects. In this paper, we first analyze why previous studies have overlooked this problem. Based on the analysis, we propose two remedies: (1) new evaluation metrics and a dataset to accurately measure localization performance for small objects, and (2) a novel consistency learning framework to zoom in on small objects so the model can perceive them more clearly. Our extensive experimental results demonstrate that the proposed method significantly improves small object localization on four different backbone networks and four different datasets, without sacrificing the performance of medium and large objects. In addition to these gains, our method can be easily applied to existing WSOL methods as it does not require any changes to the model architecture or data input pipeline. Code is available at <span><span>https://github.com/dongjunhwang/small_object_wsol</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130494"},"PeriodicalIF":5.5,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-06-10DOI: 10.1016/j.neucom.2025.130562
Yaoyao Zhou , Gang Chen , Changli Pu , Keyu Wu , Zhenghua Chen
{"title":"Distributed policy evaluation over multi-agent network with communication delays","authors":"Yaoyao Zhou , Gang Chen , Changli Pu , Keyu Wu , Zhenghua Chen","doi":"10.1016/j.neucom.2025.130562","DOIUrl":"10.1016/j.neucom.2025.130562","url":null,"abstract":"<div><div>This paper investigates the multi-agent policy evaluation problem for distributed reinforcement learning on time-varying directed communication structure with communication delays. In a completely distributed setting, agents jointly learn the value of a given policy through private local evaluation and neighbors’ evaluation. First, we propose the Push-Sum Dual Averaging Algorithm (PS-DAA) to deal with the distributed policy evaluation problem with communication delays. By considering the inevitable communication delays, a more general time-varying directed communication structure, and more realistic state constraints, PS-DAA still achieves sublinear convergence. Further, considering the case where the full update information is unavailable, we extend PS-DAA to the bandit feedback setting, i.e., the values of the sampling points are used instead of the full gradient information. We prove that compared to the full information scheme, the bandit-feedback PS-DAA does not lead to performance degradation. Finally, we verify the effectiveness of the proposed algorithm through two simulation cases.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130562"},"PeriodicalIF":5.5,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144291562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-06-10DOI: 10.1016/j.neucom.2025.130621
Dan Niu , Chunlei Shi , Tianbao Zhang , Hongbin Wang , Zengliang Zang , Mingbo Jiang , Jun Yang
{"title":"M4Caster: Multi-source, multi-spatial, multi-temporal modeling for precipitation nowcasting","authors":"Dan Niu , Chunlei Shi , Tianbao Zhang , Hongbin Wang , Zengliang Zang , Mingbo Jiang , Jun Yang","doi":"10.1016/j.neucom.2025.130621","DOIUrl":"10.1016/j.neucom.2025.130621","url":null,"abstract":"<div><div>Precipitation nowcasting, especially accurate rainstorm warnings, is crucial for mitigating meteorological risks and enhancing public safety. Owing to the inherently chaotic evolution of precipitation systems, addressing the problem poses significant challenges. The challenges include the lack of comprehensive understanding of chaotic precipitation systems in mainstream single-source approaches and the limitations of pure convolutional methods in capturing global information due to restricted receptive fields, as well as the instability in training associated with ConvRNN-based models. To solve issues, we propose a novel Multi-source, Multi-spatial, and Multi-temporal feature extraction and fusion framework (M4Caster), that utilizes the observations of satellite and the ground-based radar data over the previous 0–1 h to predict radar echo sequences for the forthcoming 0–1 h. In particular, the meteorological satellite, by mining information on cloud cluster movements, is capable of capturing the features of convective initiation (CI). The M4Caster further integrates a Multi-spatial and Multi-temporal Aggregator (MMA) to adaptly extract and refine spatiotemporal features, where multi-path Cross-scale Perception Refinement (CPR) facilitates perception and communication across multiple pathway. Moreover, a bidirectional bridging technique aligns the representations of the two data sources, which fully leverages the advantages inherent in multi-source data. Experiments on a meteorological dataset in the Yangtze River Delta (YRD) region show that our M4Caster outperforms the ingenious PredRNN++ and Transformer-based methods, delivering better nowcasting performance, especially in rainstorm prediction and the prediction of convective initiation events.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130621"},"PeriodicalIF":5.5,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144291564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CSubBT: A modular execution framework with self-adjusting capability for mobile manipulation system","authors":"Huihui Guo, Huizhang Luo, Huilong Pi, Mingxing Duan, Kenli Li, Chubo Liu","doi":"10.1016/j.neucom.2025.130608","DOIUrl":"10.1016/j.neucom.2025.130608","url":null,"abstract":"<div><div>Embodied intelligence is advancing the capability of intelligent agents to transition from controlled environments, such as factories, to unstructured real-world settings by integrating perception, planning, and physical interaction. Task and Motion Planning (TAMP) can guide agents in completing complex tasks in these unstructured environments. However, the execution of plans by agents is not merely the implementation of pre-defined instructions. The planned actions often fail due to discrepancies between the perceptual information used in planning and the actual conditions encountered. Existing robust execution systems fall short of providing a universal solution at the execution level, making them unsuitable as actuators for upstream task planners. In this paper, we propose the Conditional Subtree (CSubBT), a modular, self-adjusting execution framework for mobile manipulation systems based on Behavior Trees (BTs). CSubBT decomposes planned actions into sub-actions and leverages BTs to control their execution, addressing potential anomalies without the need for intervention from high-level planners. CSubBT treats common anomalies as constraint non-satisfaction problems and continuously guides the robot in performing tasks by sampling new action parameters in the constraint space when anomalies are detected. We validate the robustness of our framework through extensive manipulation experiments conducted in both simulated and real-world environments across different platforms.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"647 ","pages":"Article 130608"},"PeriodicalIF":5.5,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144261950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-06-10DOI: 10.1016/j.neucom.2025.130619
Dominik Olszewski
{"title":"Data vulnerability index for the “crowding problem” in nonlinear dimensionality reduction","authors":"Dominik Olszewski","doi":"10.1016/j.neucom.2025.130619","DOIUrl":"10.1016/j.neucom.2025.130619","url":null,"abstract":"<div><div>We propose a data vulnerability index measuring the intensity and harmfulness level of the “crowding problem” in nonlinear dimensionality reduction. The index is useful in supporting nonlinear dimensionality reduction by increasing its robustness to this problem. The index informs about the necessity of using the methods secured from the problem and justifies their employment. The vulnerability index provides auxiliary preliminary information that is helpful in conducting and guiding further dimensionality reduction and data visualization. The introduced index is formulated on the basis of the <span><math><mi>k</mi></math></span>-Nearest Neighbors (<span><math><mi>k</mi></math></span>-NN) graph of the data. The graph allows for estimating the intrinsic dimensionality of the low-dimensional manifold embedded in the input high-dimensional linear Euclidean space, which is required during our index computation. The experiments on thirteen real datasets confirm the usefulness of our index in nonlinear dimensionality reduction and its ability to detect the “crowding problem” and determine its gravity. The index values ranged from 2 to 26 corresponding to an increase in superiority of the methods using the <span><math><mi>t</mi></math></span>-distribution over those not using it. Moreover, we conducted additional experiments on tuning the neighborhood width parameter in Neighborhood Preserving Projections (NPPs). For most datasets, an improvement was achieved based on Adjusted Mutual Information (AMI) and silhouette values. The highest increase in AMI was obtained for <span><math><mi>t</mi></math></span>-NeRV (0.9410 vs. 0.8169) and in silhouette for <span><math><mi>t</mi></math></span>-SNE (0.8124 vs. 0.6992).</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130619"},"PeriodicalIF":5.5,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-06-10DOI: 10.1016/j.neucom.2025.130598
Gaoxin Ma , Xingquan Zhu , Zhen Tian , Yangdong Ye , Zhenfeng Zhu
{"title":"Few-Shot Object Counting with frequency attention and multi-perception head","authors":"Gaoxin Ma , Xingquan Zhu , Zhen Tian , Yangdong Ye , Zhenfeng Zhu","doi":"10.1016/j.neucom.2025.130598","DOIUrl":"10.1016/j.neucom.2025.130598","url":null,"abstract":"<div><div>Few-Shot Object Counting (FSC) is a critical technique in computer vision, which focuses on estimating the number of exemplar objects in target tasks. This technique is highly versatile and applicable in diverse domains, such as crowd monitoring, traffic management, and wildlife tracking. The primary challenge in FSC is achieving robust feature matching despite the gap between the diversity of targets and the scarcity of exemplars. In this research, we propose the Few-shot Object Counting Network with Frequency Attention and Multi-Perception Head (FFMP), which aims to enhance the limited examples by identifying additional instances within query images. The FFMP framework comprises three core components: Frequency Domain Feature Fusion (FDF), Self-Adaptive Feature Enhancement (SFE), and Multi-Perception Head (MP). The FDF component fuses features from both spatial and frequency domains to generate more precise similarity maps. The SFE component identifies and focuses on recurrent target features within query images, enriching the initial set of examples and providing a detailed understanding of the target category. Additionally, the MP component integrates counting and detection tasks, thereby improving overall performance. Extensive experiments on the FSC-147 dataset and various class-specific counting datasets demonstrate that FFMP achieves competitive counting performance compared to state-of-the-art methods. Code is available at <span><span>https://github.com/dsl161/FFMP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130598"},"PeriodicalIF":5.5,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-06-10DOI: 10.1016/j.neucom.2025.130605
Qing Yang , Xiaobing Hu , Jiali Yu , Qixun Sun , Lan Shu , Zhang Yi , Yong Liao
{"title":"var-nmODE: Model with L2-stability based on nmODE for defending against adversarial attacks","authors":"Qing Yang , Xiaobing Hu , Jiali Yu , Qixun Sun , Lan Shu , Zhang Yi , Yong Liao","doi":"10.1016/j.neucom.2025.130605","DOIUrl":"10.1016/j.neucom.2025.130605","url":null,"abstract":"<div><div>Deep neural networks (DNN) have demonstrated remarkable performance in various applications. However, their performance is significantly influenced by a wide range of perturbations, particularly adversarial perturbations, especially adversarial perturbations, which are difficult to recognize by the naked eye but cause the network to produce incorrect classifications. Some studies have shown that ordinary differential equation (ODE) networks are inherently more robust to adversarial perturbations than general deep networks. nmODE (Neural Memory Ordinary Differential Equation) is a recently proposed artificial neural network model, which has strong nonlinearity. Despite its potential, nmODE still faces challenges in adversarial defense. In this paper, we propose a variant model of neural memory ordinary differential equations (var-nmODE) to defend against adversarial attacks. Based on the theoretical foundation, var-nmODE has <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> stable mapping, which corresponds to authentication defense against <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> adversarial perturbations. Further, we conduct adversarial training on the proposed model and show that var-nmODE has better performance through experiments than nmODE. In addition, through adversarial training, the performance of var-nmODE is significantly improved, which indicates that our proposed model can resist adversarial disturbance. It is worth mentioning that var-nmODE provides inherent and certified stability, making it a valuable addition to deep learning defense research.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130605"},"PeriodicalIF":5.5,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}