Neural Networks最新文献

筛选
英文 中文
Dynamic scale position embedding for cross-modal representation learning 跨模态表示学习的动态尺度位置嵌入
IF 6.3 1区 计算机科学
Neural Networks Pub Date : 2025-09-08 DOI: 10.1016/j.neunet.2025.108087
Jungkyoo Shin, Sungmin Kang, Yoonsik Cho, Eunwoo Kim
{"title":"Dynamic scale position embedding for cross-modal representation learning","authors":"Jungkyoo Shin,&nbsp;Sungmin Kang,&nbsp;Yoonsik Cho,&nbsp;Eunwoo Kim","doi":"10.1016/j.neunet.2025.108087","DOIUrl":"10.1016/j.neunet.2025.108087","url":null,"abstract":"<div><div>In this paper, we introduce a novel approach to capture temporal information in videos across multiple scales for cross-modal learning. As videos naturally encapsulate semantic information of diverse durations, existing methods that primarily depend on fine- and coarse-grained contrastive learning may fail to fully capture the inherent semantic information. To bridge this gap, we propose Dynamic Scale Position Embedding (DSPE), a novel approach that enables a single transformer to interpret videos at various temporal scales through dynamic adjustment of temporal position embedding. In contrast to conventional multi-scale methods that aggregate video clips, DSPE maintains the distinct features of each clip, thus preserving semantic integrity and enhancing semantic content comprehension. Based on this, we present an efficient multi-scale temporal encoder designed to adeptly capture temporal information across a broad spectrum from fine to coarse granularity. Comprehensive experiments across four datasets–MSR-VTT, LSMDC, MSVD, and ActivityNet-Captions–and two distinct tasks–text-video retrieval and video-captioning–with consistent performance improvements highlight the significance of the presented multi-scale approach.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"193 ","pages":"Article 108087"},"PeriodicalIF":6.3,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentangled self-supervised video camouflaged object detection and salient object detection 解纠缠自监督视频伪装目标检测与显著目标检测
IF 6.3 1区 计算机科学
Neural Networks Pub Date : 2025-09-08 DOI: 10.1016/j.neunet.2025.108077
Haoke Xiao , Lv Tang , Bo Li , Zhiming Luo , Shaozi Li
{"title":"Disentangled self-supervised video camouflaged object detection and salient object detection","authors":"Haoke Xiao ,&nbsp;Lv Tang ,&nbsp;Bo Li ,&nbsp;Zhiming Luo ,&nbsp;Shaozi Li","doi":"10.1016/j.neunet.2025.108077","DOIUrl":"10.1016/j.neunet.2025.108077","url":null,"abstract":"<div><div>Video tasks play an important role in multimedia fields. In various video tasks, such as video camouflaged/salient object detection (VCOD/VSOD), motion and context information are two important aspects. Despite the fact that many existing works have already achieved promising results in VCOD and VSOD tasks, they still have limitations when it comes to leveraging motion and context information. In this paper, we propose a new disentangled perspective to treat motion and context information in VCOD and VSOD tasks. Our proposed model can respectively utilize context and motion information in ContextNet and MotionNet, without conflicting with each other as there can be biases between these two types of information in certain circumstances. Moreover, we further explore how to apply disentangled perspective in the self-supervised manner, which can reduce annotation costs. Specifically, we first design a self-supervised adaptive frame routing mechanism to determine whether each video frame belongs to ContextNet or MotionNet. Then we design a cross-supervision for ContextNet and MotionNet to train these two segmentation networks in self-supervised mechanism. In experiments, our proposed self-supervised disentangled model consistently outperforms state-of-the-art unsupervised methods on VCOD and VSOD datasets.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108077"},"PeriodicalIF":6.3,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145097888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive behavior with stable synapses 具有稳定突触的适应性行为。
IF 6.3 1区 计算机科学
Neural Networks Pub Date : 2025-09-06 DOI: 10.1016/j.neunet.2025.108082
Cristiano Capone , Luca Falorsi
{"title":"Adaptive behavior with stable synapses","authors":"Cristiano Capone ,&nbsp;Luca Falorsi","doi":"10.1016/j.neunet.2025.108082","DOIUrl":"10.1016/j.neunet.2025.108082","url":null,"abstract":"<div><div>Behavioral changes in animals and humans, triggered by errors or verbal instructions, can occur extremely rapidly. While learning theories typically attribute improvements in performance to synaptic plasticity, recent findings suggest that such fast adaptations may instead result from dynamic reconfiguration of the networks involved without changes to synaptic weights. Recently, similar capabilities have been observed in transformers, foundational architecture in machine learning widely used in applications such as natural language and image processing. Transformers are capable of in-context learning, the ability to adapt and acquire new information dynamically within the context of the task or environment they are currently engaged in, without changing their parameters. We argue that this property may stem from gain modulation–a feature widely observed in biological networks, such as pyramidal neurons through input segregation and dendritic amplification. We propose a constructive approach to induce in-context learning in an architecture composed of recurrent networks with gain modulation, demonstrating abilities inaccessible to standard networks. In particular, we show that, such architecture can dynamically implement standard gradient-based by encoding weight changes in the activity of another network. We argue that, while these algorithms are traditionally associated with synaptic plasticity, their reliance on non-local terms suggests that they may be more naturally realized in the brain at the level of neural circuits. We demonstrate that we can extend our approach to temporal tasks and reinforcement learning. We further validate our approach in a MuJoCo ant navigation task, showcasing a neuromorphic control paradigm via real-time network reconfiguration.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108082"},"PeriodicalIF":6.3,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145092938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph convolutional network with adaptive grouping aggregation strategy 具有自适应分组聚合策略的图卷积网络
IF 6.3 1区 计算机科学
Neural Networks Pub Date : 2025-09-06 DOI: 10.1016/j.neunet.2025.108086
Ruixiang Wang , Chunxia Zhang , Chunhong Pan
{"title":"Graph convolutional network with adaptive grouping aggregation strategy","authors":"Ruixiang Wang ,&nbsp;Chunxia Zhang ,&nbsp;Chunhong Pan","doi":"10.1016/j.neunet.2025.108086","DOIUrl":"10.1016/j.neunet.2025.108086","url":null,"abstract":"<div><div>The performance of graph convolutional networks (GCNs) with naive aggregation functions on nodes has reached the bottleneck, rendering a gap between practice and theoretical expressity. Some learning-based aggregation strategies have been proposed to improve the performance. However, few of them focus on how these strategies affect the expressity and evaluate their performance in an equal experimental setting. In this paper, we point out that the generated features lack discrimination because naive aggregation functions cannot retain sufficient node information, largely leading to the performance gap. Accordingly, a novel Adaptive Grouping Aggregation (AGA) strategy is proposed to remedy this drawback. Inspired by the label histogram in the Weisfeiler-Lehman (WL) Test, this strategy assigns each node to a unique group to retain more node information, which is proven to have a strictly more powerful expressity. In this work setting, the nodes are grouped according to a modified Student’s t-Distribution between node features and a set of learnable group labels, where the Gumbel Softmax is employed to implement this strategy in an end-to-end trainable pipeline. As a result, such a design can generate more discriminative features and offer a plug-in module in most architectures. Extensive experiments have been conducted on several benchmarks to compare our method with other aggregation strategies. The proposed method improves the performance in all control groups of all benchmarks and achieves the best result in most cases. Additional ablation studies and comparisons with state-of-the-art methods on the large-scale benchmark also indicate the superiority of our method.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108086"},"PeriodicalIF":6.3,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantity versus diversity: Influence of data on detecting EEG pathology with advanced ML models 数量与多样性:数据对先进ML模型检测脑电图病理的影响
IF 6.3 1区 计算机科学
Neural Networks Pub Date : 2025-09-06 DOI: 10.1016/j.neunet.2025.108073
Martyna Poziomska , Marian Dovgialo , Przemysław Olbratowski , Paweł Niedbalski , Paweł Ogniewski , Joanna Zych , Jacek Rogala , Jarosław Żygierewicz
{"title":"Quantity versus diversity: Influence of data on detecting EEG pathology with advanced ML models","authors":"Martyna Poziomska ,&nbsp;Marian Dovgialo ,&nbsp;Przemysław Olbratowski ,&nbsp;Paweł Niedbalski ,&nbsp;Paweł Ogniewski ,&nbsp;Joanna Zych ,&nbsp;Jacek Rogala ,&nbsp;Jarosław Żygierewicz","doi":"10.1016/j.neunet.2025.108073","DOIUrl":"10.1016/j.neunet.2025.108073","url":null,"abstract":"<div><div>This study investigates the impact of quantity and diversity of data on the performance of various machine-learning models for detecting general EEG pathology. We utilized an EEG dataset of 2993 recordings from Temple University Hospital and a dataset of 55,787 recordings from Elmiko Biosignals sp. z o.o. The latter contains data from 39 hospitals and a diverse patient set with varied conditions. Thus, we introduce the Elmiko dataset – the largest publicly available EEG corpus. Our findings show that small and consistent datasets enable a wide range of models to achieve high accuracy; however, variations in pathological conditions, recording protocols, and labeling standards lead to significant performance degradation. Nonetheless, increasing the number of available recordings improves predictive accuracy and may even compensate for data diversity, particularly in neural networks based on attention mechanism or transformer architecture. A meta-model that combined these networks with a gradient-boosting approach using handcrafted features demonstrated superior performance across varied datasets.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"193 ","pages":"Article 108073"},"PeriodicalIF":6.3,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting DIRE: towards universal AI-generated image detection. 重访可怕:走向通用人工智能生成的图像检测。
IF 6.3 1区 计算机科学
Neural Networks Pub Date : 2025-09-06 DOI: 10.1016/j.neunet.2025.108084
Huanqi Lin, Jinghui Qin, Xiaoqi Wu, Tianshui Chen, Zhijing Yang
{"title":"Revisiting DIRE: towards universal AI-generated image detection.","authors":"Huanqi Lin, Jinghui Qin, Xiaoqi Wu, Tianshui Chen, Zhijing Yang","doi":"10.1016/j.neunet.2025.108084","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.108084","url":null,"abstract":"<p><p>The rapid development of generative models has improved image quality and made image synthesis widely accessible, raising concerns about content credibility. To address this issue, we propose a method called Universal Reconstruction Residual Analysis (UR<sup>2</sup>EA) for detecting synthetic images. Our study reveals that, when GAN- and diffusion-generated images are reconstructed by pre-trained diffusion models, they exhibit significant differences in reconstruction error compared to real images: GAN-generated images show lower reconstruction quality than real images, whereas diffusion-generated images are more accurately reconstructed. We leverage these residual maps as a universal prior to training a model for detecting synthetic images. In addition, we introduce a Multi-scale Channel and Window Attention (MCWA) module to extract fine-grained features from residual maps across multiple scales, capturing both local and global details. To facilitate the exploration of diverse detection methods, we constructed a new UniversalForensics dataset, which includes various representations of synthetic images generated by 30 different models. Compared to the best-performing baselines, our method improves average accuracy by 3.3 % and precision by 1.6 %, achieving state-of-the-art results.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"193 ","pages":"108084"},"PeriodicalIF":6.3,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145058700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight local and global granularity selection optimization network for single image super-resolution 单幅图像超分辨率的轻量级局部和全局粒度选择优化网络
IF 6.3 1区 计算机科学
Neural Networks Pub Date : 2025-09-05 DOI: 10.1016/j.neunet.2025.108085
Zhihao Peng , Mang Hu , Xinyuan Qi , Sheng Wu , Qianqian Xia , Jianga Shang , Linquan Yang
{"title":"Lightweight local and global granularity selection optimization network for single image super-resolution","authors":"Zhihao Peng ,&nbsp;Mang Hu ,&nbsp;Xinyuan Qi ,&nbsp;Sheng Wu ,&nbsp;Qianqian Xia ,&nbsp;Jianga Shang ,&nbsp;Linquan Yang","doi":"10.1016/j.neunet.2025.108085","DOIUrl":"10.1016/j.neunet.2025.108085","url":null,"abstract":"<div><div>Recently, neural networks that combine local and global granularity features have made significant progress in single image super-resolution (SISR). However, when dealing with local granularity, these models often fuse features from coarse to fine in a linear manner, which leads to redundant feature representations and inefficient information extraction. Additionally, global granularity feature extraction is often compromised by the interference of irrelevant features that reduce the model’s ability to effectively capture global dependencies, ultimately affecting reconstruction quality. In this paper, a lightweight local and global granularity selection optimization network-LGGSONet is proposed to enhance the capability of feature extraction. First, we present a local granularity selection module (LGSM), which applies a novel nonlinear convolution method to dynamically fuse multi-scale features and adaptively select effective information. Next, we design a global granularity optimization module (GGOM), which uses global transposed attention for feature extraction while dynamically filtering out irrelevant spatial fine-grained features. Then, we construct a mixed granularity transformer block (MGTB), combining LGSM and GGOM. Finally, MGTB is integrated into the mixed granularity residual transformer group (MGRTG) to simplify network training. Extensive experiments show that LGGSONet based on MGRTG achieves a PSNR improvement of 0.30 dB over other advanced lightweight methods while maintaining fewer parameters and computational costs.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"193 ","pages":"Article 108085"},"PeriodicalIF":6.3,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-driven optimization of collaborative multi-agent via case learning and curiosity 基于案例学习和好奇心的协同多智能体双驱动优化
IF 6.3 1区 计算机科学
Neural Networks Pub Date : 2025-09-05 DOI: 10.1016/j.neunet.2025.108083
Ruizhu Chen , Rong Fei , Junhuai Li , Aimin Li , Yalin Miao , Lili Wu , Zhiming Chen
{"title":"Dual-driven optimization of collaborative multi-agent via case learning and curiosity","authors":"Ruizhu Chen ,&nbsp;Rong Fei ,&nbsp;Junhuai Li ,&nbsp;Aimin Li ,&nbsp;Yalin Miao ,&nbsp;Lili Wu ,&nbsp;Zhiming Chen","doi":"10.1016/j.neunet.2025.108083","DOIUrl":"10.1016/j.neunet.2025.108083","url":null,"abstract":"<div><div>Multi-Agent Deep Reinforcement Learning(MADRL) faces significant challenges in exploration-exploitation trade-off during training, particularly when learning collaborative behaviors through continuous environment interactions. Current exploration methods generally rely on unbiased randomized policy, which makes the policy optimization process lack of goal-directed, resulting in a large number of low signal-to-noise ratio transitions collected in the experience replay buffer, which seriously affects the learning efficiency and policy convergence stability of MADRL. To address the above research challenges, We propose the Case-Enhanced Random Network Distillation Exploration for Centralized Training and Decentralized Execution(CERE-CTDE) paradigm. Our innovation lies in the novel integration of Random Network Distillation(RND) and Case-Based Reasoning(CBR): RND provides intrinsic motivation to enhance exploration and overcome sparse rewards, while CBR enables goal-directed exploitation by leveraging historical case to guide agent action selection. This dual mechanism creates a dynamic equilibrium between exploring novel policy and exploiting proven case, effectively preventing premature convergence. We incorporate the CERE into two categories of MADRL methods based on the CTDE paradigm. The performance of us is assessed and validated with 2 methods focused on exploration using 13 confrontation scenarios in the StarCraft Multi-Agent Challenge(SMAC). The experimental results demonstrate: a 17.97 % statistically significant improvement in win rate on complex battlefields compared to baseline performance in simple scenarios; effective enhancement of policy exploration-exploitation and mitigation of partial sparse reward problems through intrinsic motivation and CBR-guided action sampling; and superior capability in escaping local optima while maintaining learning efficiency. The framework’s robustness is further validated by its consistent performance across different SMAC scenarios with varying difficulty levels.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108083"},"PeriodicalIF":6.3,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GoFormer: A GoLPP inspired transformer for functional brain graph learning and classification GoFormer:一个GoLPP启发的转换器,用于功能性脑图学习和分类
IF 6.3 1区 计算机科学
Neural Networks Pub Date : 2025-09-05 DOI: 10.1016/j.neunet.2025.108081
Mengxue Pang , Lina Zhou , Xueying Yao , Jun Yang , Jinshan Zhang , Yining Zhang , Limei Zhang , Lishan Qiao
{"title":"GoFormer: A GoLPP inspired transformer for functional brain graph learning and classification","authors":"Mengxue Pang ,&nbsp;Lina Zhou ,&nbsp;Xueying Yao ,&nbsp;Jun Yang ,&nbsp;Jinshan Zhang ,&nbsp;Yining Zhang ,&nbsp;Limei Zhang ,&nbsp;Lishan Qiao","doi":"10.1016/j.neunet.2025.108081","DOIUrl":"10.1016/j.neunet.2025.108081","url":null,"abstract":"<div><div>Graph has a great potential in modelling the complex relationship among data, and learning a high-quality graph usually plays a critical role in many downstream tasks. In 2010, we proposed the graph-optimized locality preserving projections (GoLPP) that was the first work to learn graphs adaptively with the dimensionality reduction task, exhibiting a better performance than the methods based on predefined graphs. Recently, the graph learning is re-highlighted partially due to the popularity of Transformer that leverages the self-attention mechanism to model the relationship between tokens by an updatable graph. Despite its great success, Transformer has a weak inductive bias and needs to be trained on large-scale datasets. For some practical scenarios such as intelligent medicine, however, it is difficult to collect sufficient data to support the training of Transformer. By revisiting GoLPP, we have an interesting finding that its iterative process between the graph and projection matrix precisely corresponds to the working mechanism of self-attention modules in Transformer, which inspires us to design a novel method, namely GoFormer, towards getting the best from both worlds. Specifically, GoFormer not only inherits the power of Transformer for handling the sequence data in an end-to-end form, but also balances the parsimonious principle by integrating the parameter updating and sharing mechanism implicitly involved in GoLPP. Compared with Transformer, GoFormer can mitigate the risk of overfitting and has a better interpretability for medical applications. To evaluate its effectiveness, we use GoFormer to learn and classify brain graphs based on functional magnetic resonance imaging (fMRI) data for the early diagnosis of neurological disorders. Experimental results demonstrate that GoFormer outperforms the baseline and state-of-the-art methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"193 ","pages":"Article 108081"},"PeriodicalIF":6.3,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145027021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From external to internal: Step-wise feature enhancement network for image-text retrieval 从外部到内部:用于图像文本检索的分步特征增强网络
IF 6.3 1区 计算机科学
Neural Networks Pub Date : 2025-09-05 DOI: 10.1016/j.neunet.2025.108072
Jingyao Wang , Zheng Liu , Shanshan Gao , Junhao Xu , Changhao Li
{"title":"From external to internal: Step-wise feature enhancement network for image-text retrieval","authors":"Jingyao Wang ,&nbsp;Zheng Liu ,&nbsp;Shanshan Gao ,&nbsp;Junhao Xu ,&nbsp;Changhao Li","doi":"10.1016/j.neunet.2025.108072","DOIUrl":"10.1016/j.neunet.2025.108072","url":null,"abstract":"<div><div>Image-Text Retrieval (ITR) is a challenging task due to the inherent inconsistency in feature representations across different modalities, commonly referred to as the “heterogeneity gap”. To bridge this gap, establishing stronger associations between images and texts by capturing semantic cues as comprehensively as possible is an effective approach. However, existing ITR methods cannot completely capture semantic cues derived from a large-scale image-text corpus beyond a single image-text pair. Therefore, we propose a two-layer Step-wise Feature Enhancement (SFE) Network to establish a semantic propagation pathway, guiding semantic information flow progressively from the external layer to the internal layer. In Step 1, External Semantic Cues (ESC) are captured from visual and textual semantic concepts based on patch-level, instance-level, and neighbor-level co-occurrences within an image-text corpus. Then, visual and textual features are enhanced in the external layer with ESC by mining co-occurrences at the patch, instance, and neighbor levels. Note that Instance-level and Neighbor-level co-occurrence belong to cross-modal ESC, which can significantly facilitate modality interaction in the external layer. In step 2, SFE first fuses semantic information propagated from step 1, and then enhances visual and textual features in the internal layer by mining Internal Semantic Cues (ISC) through cross-modal context. Specifically, visual and textual features are concatenated with their corresponding cross-modal contextual features to further enhance modality interaction within the internal layer. Experimental results demonstrate the superiority of the proposed SFE network over state-of-the-art ITR methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"193 ","pages":"Article 108072"},"PeriodicalIF":6.3,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信