NeurocomputingPub Date : 2025-07-18DOI: 10.1016/j.neucom.2025.131049
Pan Li, Xiaofang Yuan, Haozhi Xu, Jinlei Wang, Yaonan Wang
{"title":"Focus DETR: Focus detection transformer for ship wall-climbing robot real-time object detection","authors":"Pan Li, Xiaofang Yuan, Haozhi Xu, Jinlei Wang, Yaonan Wang","doi":"10.1016/j.neucom.2025.131049","DOIUrl":"10.1016/j.neucom.2025.131049","url":null,"abstract":"<div><div>With the growing use of wall-climbing robots for ship paint removal in repair yards, real-time detection of the targets of the wall-climbing robots and accurate positioning of them has become a very important task. Due to the changes in the external environment and the diversity of working postures of the wall-climbing robot, maintaining stable detection accuracy and real-time performance is a challenging task. To address this issue, a focus detection transformer (Focus-DETR) architecture is proposed. Firstly, the spatial attention recursive gated convolution(Sa-Gn) module is employed in the final stage of the backbone to extract high-level features, not only gaining high accuracy but also maintaining real-time speed. Secondly, to improve the ability to extract key features of the wall-climbing robot object, a hybrid encoder method is introduced to integrate features from adjacent stages of the neck. What’s more, the focal EIOU loss function in the detection head optimizes the width and height errors of the detection box and adjusts its loss weight. It not only improves the alignment accuracy of the detection center point of the object bounding box of the wall-climbing robot, but also is suitable for lightweight application deployment. Experimental results show that compared with RT-DETR-R50, Focus DETR-S improves the detection mAP by 6.0 % on the wall-climbing robot dataset, with the inference speed very close to that of RT-DETR-R50. It has also achieved the same performance improvement on the UAVDT dataset.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 131049"},"PeriodicalIF":5.5,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-18DOI: 10.1016/j.neucom.2025.130857
Jingming Hou , Nazlia Omar , Sabrina Tiun , Saidah Saad , Qian He
{"title":"Text-centric disentangled representation interaction network for Multimodal Sentiment Analysis","authors":"Jingming Hou , Nazlia Omar , Sabrina Tiun , Saidah Saad , Qian He","doi":"10.1016/j.neucom.2025.130857","DOIUrl":"10.1016/j.neucom.2025.130857","url":null,"abstract":"<div><div>With the rise of short video content, Multimodal Sentiment Analysis (MSA) has gained significant attention as a research hotspot. However, the issue of heterogeneity has emerged as a major challenge in fusing these three modalities. While some recent studies have attempted to reduce this problem of heterogeneity by disentangling the modalities, they overlooked two critical issues. First, their approach treats all three modalities equally during disentanglement, overlooking the central role of the text modality in MSA. As the primary carrier of semantic and emotional information, the text modality serves as the backbone for sentiment interpretation and multimodal fusion. Additionally, after disentangling the modalities, they do not effectively leverage the unique features of each modality, relying instead on simple concatenation and Transformer to combine similar and dissimilar features. To fully harness the potential of text modality and the dissimilar features between the modalities, we propose a Text-centric Disentangled Representation Interaction Network (TDRIN), consisting of two main modules. In the Disentangled Representation Learning (DRL) module, we decompose representations from different modalities into separate sub-spaces centered around text modality, aiming to capture similar and dissimilar features among the modalities. Meanwhile, we utilize various constraints to learn better features and improve predictions. Additionally, to more effectively balance the similar and dissimilar features, we design the Disentangled Representation Fusion Network (DRFN) module, which fuses disentangled representations with text modality as the center, fully exploiting the correlations among disentangled representations. Extensive experiments on the CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets demonstrate that TDRIN outperforms state-of-the-art methods across various metrics. Specifically, the F1 score surpasses the best-performing baseline by 3.19%, 0.96%, and 1.43% on the three datasets, respectively. Ablation studies further confirm the effectiveness of each module. Therefore, TDRIN effectively reduces the heterogeneity between modalities, resulting in improved performance in MSA tasks.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130857"},"PeriodicalIF":5.5,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144672055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-18DOI: 10.1016/j.neucom.2025.130882
Tianlun Luo , Qiao Yuan , Boxuan Zhu , Steven Guan , Rui Yang , Jeremy S. Smith , Eng Gee Lim
{"title":"Exploring interaction concepts for human–object-interaction detection via global- and local-scale enhancing","authors":"Tianlun Luo , Qiao Yuan , Boxuan Zhu , Steven Guan , Rui Yang , Jeremy S. Smith , Eng Gee Lim","doi":"10.1016/j.neucom.2025.130882","DOIUrl":"10.1016/j.neucom.2025.130882","url":null,"abstract":"<div><div>Understanding the interactions between human–object (HO) pairs is the key to the human–object interaction (HOI) detection task. Visual understanding research has been significantly impacted by recent advances in linguistic-visual contrastive learning. For HOI detection studies, the alignment of linguistic and visual features is usually required to be performed when linguistic knowledge is used for enhancement. This usually results in the demands of extra training data or extended training time. In this study, an effective approach for utilizing multimodal knowledge to enhance HOI learning from global and instance scales is proposed. Model performance on Rare HOI categories can be prominently improved by using projection guided by linguistic knowledge at a global scale and merging multimodal features at an instance scale. State-of-the-art performance on the HICO-Det benchmark is achieved by the proposed model, and the effectiveness of the proposed global- and local-scale multimodal learning approach is validated.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130882"},"PeriodicalIF":5.5,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-18DOI: 10.1016/j.neucom.2025.130949
Utku Erdoğan , Şahin Işık , Yıldıray Anagün , Gabriel Lord
{"title":"ExpTamed: An exponential tamed optimizer based on Langevin SDEs","authors":"Utku Erdoğan , Şahin Işık , Yıldıray Anagün , Gabriel Lord","doi":"10.1016/j.neucom.2025.130949","DOIUrl":"10.1016/j.neucom.2025.130949","url":null,"abstract":"<div><div>This study presents a new method to improve optimization by regularizing the gradients in deep learning methods based on a novel taming strategy to regulate the growth of numerical solutions for stochastic differential equations. The method, ExpTamed, enhances stability and reduces the mean-square error across a short time horizon in comparison to existing techniques. The practical effectiveness of ExpTamed is rigorously evaluated on CIFAR-10, Tiny-ImageNet, and Caltech256 across diverse architectures. In direct comparisons with prominent optimizers like Adam, ExpTamed demonstrates significant performance gains. Specifically, it achieved increases in best top-1 test accuracy ranging from 0.86 to 2.76 percentage points on CIFAR-10, and up to 4.46 percentage points on Tiny-ImageNet (without learning rate schedule). On Caltech256, ExpTamed also yielded superior accuracy, precision, and Kappa metrics. These results clearly quantify ExpTamed’s capability to deliver enhanced performance in practical deep learning applications.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130949"},"PeriodicalIF":5.5,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144678902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-18DOI: 10.1016/j.neucom.2025.130979
Ran Song , Shengxiang Gao , Xiaofei Gao , Cunli Mao , Zhengtao Yu
{"title":"MKE-PLLM: A benchmark for multilingual knowledge editing on pretrained large language model","authors":"Ran Song , Shengxiang Gao , Xiaofei Gao , Cunli Mao , Zhengtao Yu","doi":"10.1016/j.neucom.2025.130979","DOIUrl":"10.1016/j.neucom.2025.130979","url":null,"abstract":"<div><div>Multilingual large language models have demonstrated remarkable performance across various downstream tasks but are still plagued by factuality errors. Knowledge editing aims to correct these errors by modifying the internal knowledge of pre-trained models. However, current knowledge editing methods primarily focus on monolingual settings, neglecting the complexities and interdependencies within multilingual scenarios. Furthermore, benchmarks specifically designed for multilingual knowledge editing are relatively scarce. Addressing this gap, this paper constructs a novel multilingual knowledge editing benchmark. This benchmark comprehensively evaluates methods for mLLMs based on accuracy, reliability, generalization, and consistency. To ensure the robustness and usability of the benchmark, we conducted detailed analysis and validation. Concurrently, we propose a baseline method that adapts existing monolingual knowledge editing techniques to the multilingual environment. Extensive experimental results demonstrate the effectiveness of our constructed benchmark in evaluating multilingual knowledge editing.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130979"},"PeriodicalIF":5.5,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-18DOI: 10.1016/j.neucom.2025.131050
Gang Xu , Ao Shen , Yuchen Yang , Xiantong Zhen , Wei Chen , Jun Xu
{"title":"Joint super-resolution and inverse tone-mapping: A feature decomposition aggregation network and a new benchmark","authors":"Gang Xu , Ao Shen , Yuchen Yang , Xiantong Zhen , Wei Chen , Jun Xu","doi":"10.1016/j.neucom.2025.131050","DOIUrl":"10.1016/j.neucom.2025.131050","url":null,"abstract":"<div><div>Joint Super-Resolution and Inverse Tone-Mapping (joint SR-ITM) aims to increase the resolution and dynamic range of low-resolution and standard dynamic range images. Recent networks mainly resort to image decomposition techniques with complex multi-branch architectures. However, the fixed decomposition techniques would largely restrict their power on versatile images. To exploit the potential power of decomposition mechanism, in this paper, we generalize it from the image domain to the broader feature domain. To this end, we propose a lightweight Feature Decomposition Aggregation Network (FDAN). In particular, we design a Feature Decomposition Block (FDB) to achieve learnable separation of detail and base feature maps, and develop a Hierarchical Feature Decomposition Group by cascading FDBs for powerful multi-level feature decomposition. Moreover, for better evaluation, we collect a large-scale dataset for joint SR-ITM, <em>i.e.</em>, SRITM-4K, which provides versatile scenarios for robust model training and evaluation. Experimental results on two benchmark datasets demonstrate that our FDAN is efficient and outperforms state-of-the-art methods on joint SR-ITM. The code of our FDAN and the SRITM-4K dataset are available at <span><span>https://github.com/CS-GangXu/FDAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 131050"},"PeriodicalIF":5.5,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-17DOI: 10.1016/j.neucom.2025.130951
Zhihao Zhou , Li Zhang , Qile Liu , Gan Huang , Zhuliang Yu , Zhen Liang
{"title":"Emotion agent: Unsupervised deep reinforcement learning with distribution-prototype reward for continuous emotional EEG analysis","authors":"Zhihao Zhou , Li Zhang , Qile Liu , Gan Huang , Zhuliang Yu , Zhen Liang","doi":"10.1016/j.neucom.2025.130951","DOIUrl":"10.1016/j.neucom.2025.130951","url":null,"abstract":"<div><div>Continuous Electroencephalography (EEG) signals are widely employed in affective brain-computer interface (aBCI) applications. However, only a subset of the continuously acquired EEG data is truly relevant to emotional processing, while the remainder is often noisy or unrelated. Manual annotation of these key emotional segments is impractical due to their dynamic and individualized nature. To address this challenge, we propose a novel unsupervised deep reinforcement learning framework, termed <em>Emotion Agent</em>, which automatically identifies and extracts the most informative emotional segments from continuous EEG signals. Emotion Agent initially utilizes a heuristic algorithm to perform a global search and generate prototype representations of the EEG signals. These prototypes guide the exploration of the signal space and highlight regions of interest. Furthermore, we design a distribution-prototype-based reward function that evaluates the interaction between samples and prototypes to ensure that the selected segments are both representative and relevant to the underlying emotional states. Finally, the framework is trained using Proximal Policy Optimization (PPO) to achieve stable and efficient convergence. Experimental results on three widely used datasets (covering both discrete and dimensional emotion recognition) show an average improvement of 13.46 % when using the proposed Emotion Agent, demonstrating its significant enhancement of accuracy and robustness in downstream aBCI tasks.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 130951"},"PeriodicalIF":5.5,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-17DOI: 10.1016/j.neucom.2025.130950
Bo Xu , Guoxu Li , Jie Wang , Zheng Wang , Jianfu Cao , Rong Wang , Feiping Nie
{"title":"Dynamic T-distributed stochastic neighbor graph convolutional networks for multi-modal contrastive fusion","authors":"Bo Xu , Guoxu Li , Jie Wang , Zheng Wang , Jianfu Cao , Rong Wang , Feiping Nie","doi":"10.1016/j.neucom.2025.130950","DOIUrl":"10.1016/j.neucom.2025.130950","url":null,"abstract":"<div><div>As the continuous advancement of data acquisition technologies progresses, multi-modal data have emerged as a prominent focus in various domains. This paper aims to tackle critical challenges in the multi-modal fusion process, specifically in representation learning, modal consistency invariance learning, and model diversity complementarity learning, by employing graph convolutional networks and contrastive learning methods. Current GCN-based methods generally depend on predefined graphs for representation learning, limiting their capacity to capture local and global information effectively. Furthermore, some current models do not adequately compare the representations of consistency and diversity across different modalities during the fusion procedure. To address the identified challenges, we propose a novel T-distributed Stochastic Neighbor Contrastive Graph Convolutional Network (TSNGCN). It consists of the adaptive static graph learning module, the multi-modal representation learning module, and the multi-modal contrastive fusion module. The adaptive static graph learning module constructs graphs without relying on any predefined distance metrics, which creates a pairwise graph adaptively to preserve the local structure of general data. Moreover, a loss function based on T-distributed stochastic neighbor embedding is designed to learn the transformation between the embeddings and the original data, thus facilitating the exploration of more discriminative information within the learned subspace. In addition, the proposed multi-modal contrastive fusion module effectively maximizes the similarity of the same samples across different modalities while ensuring the distinction of dissimilar samples, thereby enhancing the model’s consistency objective. Extensive experiments conducted on several multi-modal benchmark datasets demonstrate the superiority and effectiveness of TSNGCN compared to existing methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 130950"},"PeriodicalIF":5.5,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-17DOI: 10.1016/j.neucom.2025.130999
Mengxuan Sun , Xuebing Yang , Jiayi Geng , Jinghao Niu , Chutong Wang , Chang Cui , Xiuyuan Chen , Wen Tang , Wensheng Zhang
{"title":"CTMEG: A continuous-time medical event generation model for clinical prediction of long-term disease progression","authors":"Mengxuan Sun , Xuebing Yang , Jiayi Geng , Jinghao Niu , Chutong Wang , Chang Cui , Xiuyuan Chen , Wen Tang , Wensheng Zhang","doi":"10.1016/j.neucom.2025.130999","DOIUrl":"10.1016/j.neucom.2025.130999","url":null,"abstract":"<div><div>Long-term health monitoring indicates patient’s disease progression which is critical in improving the quality of patient life and physician’s decision-making. Predictive models based on Electronic Health Records (EHRs) can offer substantial clinical support by alerting subsequent disease-associated adverse events. Effective disease progression modeling involves two subtasks: 1) estimation of disease-associated event occurrence times 2) classification of occurred event types Recent time-aware disease predictive models, mainly based on recurrent neural networks or attention networks, specialize in future disease type prediction by accounting for the temporal irregularities in EHRs. This paper focuses on multi-step continuous-time disease prediction, which is more challenging as predictive models can easily fall into task conflicts between subtasks. We propose a multi-task disentangled Continuous-Time Medical Event Generation (CTMEG) model to simultaneously tackle the two subtasks. Unlike conventional continuous-time models, CTMEG encodes multi-view historical medical events and then simultaneously predicts multi-step disease types and occurrence times. First, a discrete Conditional Intensity Function (CIF) is designed to better estimate the disease occurrence time with limited available data. Second, to reduce task conflicts, a gated network is proposed to disentangle the rough patient representation into task-specific representations. Finally, we utilize a tailored CIF attention module to reduce error accumulation during the prediction process. Extensive experiments on the eICU and BFH databases demonstrate that the proposed CTMEG outperforms twelve competing models in long-term disease progression prediction. Our codes are available on github.<span><span><sup>2</sup></span></span></div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130999"},"PeriodicalIF":5.5,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144703808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2025-07-17DOI: 10.1016/j.neucom.2025.130848
Nayiri Galestian Pour , Soudabeh Shemehsavar
{"title":"DCB-VIM: An ensemble learning based filter method for feature selection with imbalanced class distribution","authors":"Nayiri Galestian Pour , Soudabeh Shemehsavar","doi":"10.1016/j.neucom.2025.130848","DOIUrl":"10.1016/j.neucom.2025.130848","url":null,"abstract":"<div><div>Feature selection aims to improve predictive performance and interpretability in the analysis of datasets with high dimensional feature spaces. Imbalanced class distribution can make the process of feature selection more severe. Robust methodologies are essential for dealing with this case. Therefore, we present a filter method based on ensemble learning, in which each classifier is built on randomly selected subspaces of features. Variable importance measure is computed based on a class-wise procedure within each classifier, and a feature weighting procedure is subsequently applied. The performance of classifiers is considered in the combination phase of the ensemble learning. Different choices of hyperparameters consisting of the subspace size and the number of classification trees are investigated through simulation studies for determining their effects on the predictive performance. The efficiency of the proposed method is evaluated with respect to predictive performance by different selection strategies based on real data analysis in the presence of class imbalance.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130848"},"PeriodicalIF":5.5,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}