Neurocomputing最新文献

Towards explainable trajectory classification: A segment-based perturbation approach 迈向可解释轨迹分类：一种基于片段的微扰方法

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-10-07 DOI: 10.1016/j.neucom.2025.131691

Le Xuan Tung , Bui Dang Phuc , Vo Nguyen Le Duy

引用次数: 0

VLPRSDet: A vision–language pretrained model for remote sensing object detection VLPRSDet：用于遥感目标检测的视觉语言预训练模型

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-10-04 DOI: 10.1016/j.neucom.2025.131712

Dongyang Liu , Xuejian Liang , Yunxiao Qi , Yunqiao Xi , Jing Jin , Junping Zhang

{"title":"VLPRSDet: A vision–language pretrained model for remote sensing object detection","authors":"Dongyang Liu , Xuejian Liang , Yunxiao Qi , Yunqiao Xi , Jing Jin , Junping Zhang","doi":"10.1016/j.neucom.2025.131712","DOIUrl":"10.1016/j.neucom.2025.131712","url":null,"abstract":"<div><div>Recently, numerous excellent vision-language models have emerged in the field of computer vision. These models have demonstrated strong zero-shot detection capabilities and better accuracy after fine-tuning on new datasets in the field of object detection. However, when these models are directly applied to the field of remote sensing, their performance is less than satisfactory. To address this problem, a novel vision-language pretrained model specifically tailored for remote sensing object detection task is proposed. Firstly, we create a new dataset composed of object-text pairs by collecting a large amount of remote sensing image object detection data to train the proposed model. Then, by integrating the CLIP model in the field of remote sensing with the YOLO detector, we propose a vision-language pretrained model for remote sensing object detection (VLPRSDet). VLPRSDet achieves enhanced fusion of visual and textual features through a vision language path aggregation network, and then aligns visual embeddings and textual embeddings through Region Text Matching to achieve the alignment between object regions and text. Experimental results indicate that the proposed VLPRSDet exhibits robust zero-shot capabilities in the field of remote sensing object detection, and can achieve superior detection accuracy after fine-tuning on specific datasets. Specifically, after fine-tuning, VLPRSDet can achieve 76.2 % mAP on the DIOR dataset and 94.2 % mAP on the HRRSD dataset. The code and dataset will be released at <span><span>https://github.com/dyl96/VLPRSDet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131712"},"PeriodicalIF":6.5,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised temporal action segmentation with sample discrimination training and alignment-based boundary refinement 基于样本识别训练和对齐的无监督时间动作分割

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-10-03 DOI: 10.1016/j.neucom.2025.131636

Feng Huang , Xiao-Diao Chen , Hongyu Chen , Haichuan Song

{"title":"Unsupervised temporal action segmentation with sample discrimination training and alignment-based boundary refinement","authors":"Feng Huang , Xiao-Diao Chen , Hongyu Chen , Haichuan Song","doi":"10.1016/j.neucom.2025.131636","DOIUrl":"10.1016/j.neucom.2025.131636","url":null,"abstract":"<div><div>Unsupervised temporal action segmentation (UTAS) addresses the task of partitioning untrimmed videos into coherent action segments without manual annotations. While boundary-detection-based approaches have demonstrated superior performance, they exhibit two critical limitations. First, these methods often uniformly treat all frames during training, resulting in over-segmentation and suboptimal performance. Second, they primarily rely on intra-video features while neglecting potentially valuable inter-video correlations within the dataset. To address these challenges, we present a comprehensive UTAS framework with three key innovations: (1) A discriminative training mechanism that differentiates between boundary/non-boundary frames in the temporal domain and motion/background pixels in the spatial domain, employing weighted training strategies alongside multiple temporal-scale modeling. (2) A self-validation mechanism for cross-verifying predictions across different input sequences. (3) A boundary refinement approach based on video alignment, which constructs reference video sets according to feature distributions and establishes inter-video correspondences to improve boundary localization. Extensive evaluations on three benchmark datasets, <em>i.e.</em>, the Breakfast, the 50Salads, and the YouTube Instructions, demonstrate that our approach achieves state-of-the-art performance, with quantitative results showing significant improvements over existing methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131636"},"PeriodicalIF":6.5,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Auto-weighted graph tensor and rank-constrained bipartite graph fusion for multi-view clustering 多视图聚类的自加权图张量和秩约束二部图融合

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-10-02 DOI: 10.1016/j.neucom.2025.131575

Jie Zhang , Xiaoqian Zhang , Jinghao Li , Yongyi Yang , Zhenwen Ren , Rong Tang , Dong Wang

{"title":"Auto-weighted graph tensor and rank-constrained bipartite graph fusion for multi-view clustering","authors":"Jie Zhang , Xiaoqian Zhang , Jinghao Li , Yongyi Yang , Zhenwen Ren , Rong Tang , Dong Wang","doi":"10.1016/j.neucom.2025.131575","DOIUrl":"10.1016/j.neucom.2025.131575","url":null,"abstract":"<div><div>Tensor multi-view clustering generally outperforms non-tensor counterparts, as the tensor structure can effectively capture the higher-order correlations of data. Although the t-SVD-based tensor nuclear norm has shown remarkable performance, it treats the similar information across all views equally, overlooking the higher-order similarities between similar graphs. To address this issue, we propose a Pearson Correlation Coefficient-based <span><math><mtext>A</mtext></math></span>uto-weighted <span><math><mtext>G</mtext></math></span>raph <span><math><mtext>T</mtext></math></span>ensor and <span><math><mtext>R</mtext></math></span>ank-constrained <span><math><mtext>B</mtext></math></span>ipartite <span><math><mtext>G</mtext></math></span>raph <span><math><mtext>F</mtext></math></span>usion (AGTRBGF) approach for multi-view clustering. Specifically, the P-AGT learning method breaks free from the constraints of predefined weights, automatically assigning optimal weight values for each similarity graph by leveraging the higher-order similarities among the similar graphs of different views. Additionally, the Laplace rank is utilized to constrain the adaptive graph fusion, endowing learned consensus graph with strong diagonal structure and enhancing the model’s robustness. Experiments conducted on distinct datasets validate the effectiveness and superior clustering performance of AGTRBGF.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131575"},"PeriodicalIF":6.5,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IntSTR: An integrated spatio-temporal relation transformer for video object detection 一种用于视频目标检测的集成时空关系转换器

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-10-02 DOI: 10.1016/j.neucom.2025.131704

Wentao Zheng , Hong Zheng , Yuquan Sun , Ying Jing

{"title":"IntSTR: An integrated spatio-temporal relation transformer for video object detection","authors":"Wentao Zheng , Hong Zheng , Yuquan Sun , Ying Jing","doi":"10.1016/j.neucom.2025.131704","DOIUrl":"10.1016/j.neucom.2025.131704","url":null,"abstract":"<div><div>In recent years, Transformer-based video object detection (VOD) methods have achieved remarkable progress by replacing the hand-crafted components traditionally used in CNN-based detectors. However, many existing approaches rely on staged spatio-temporal modeling strategies, which increase model complexity and restrict early interaction between spatial and temporal information. To overcome these limitations, we propose IntSTR, a novel framework for unified spatio-temporal modeling. At its core, the spatio-temporal relation encoder (STRE) integrates spatio-temporal feature processing within a single encoder through cascaded attention modules. To strengthen temporal consistency, the temporal query relation (TQR) module explicitly captures geometric relations between object queries across adjacent frames with minimal computational overhead. In addition, the Temporal Feature Memory (TFM) maintains a dynamic memory bank that caches temporal contexts, enabling effective feature aggregation and efficient online processing. Extensive experiments on the ImageNet VID dataset validate the effectiveness of our approach. IntSTR achieves an excellent trade-off between accuracy and efficiency, reaching a competitive 87.2 % <span><math><msub><mtext>mAP</mtext><mrow><mn>50</mn></mrow></msub></math></span> with the ResNet-101 backbone while maintaining real-time performance at 33.4 FPS.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131704"},"PeriodicalIF":6.5,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DJIST: Decoupled joint image and sequence training framework for sequential visual place recognition 序列视觉位置识别的解耦联合图像和序列训练框架

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-30 DOI: 10.1016/j.neucom.2025.131622

Shanshan Wan , Lai Kang , Yingmei Wei , Tianrui Shen , Haixuan Wang , Chao Zuo

{"title":"DJIST: Decoupled joint image and sequence training framework for sequential visual place recognition","authors":"Shanshan Wan , Lai Kang , Yingmei Wei , Tianrui Shen , Haixuan Wang , Chao Zuo","doi":"10.1016/j.neucom.2025.131622","DOIUrl":"10.1016/j.neucom.2025.131622","url":null,"abstract":"<div><div>Traditional image-to-image (im2im) visual place recognition (VPR) involves matching a single query image to stored geo-tagged database images. In real-time robotic and autonomous applications, while a continuous stream of frames naturally leads to a simpler sequence-to-sequence (seq2seq) VPR problem, the challenges remain since labeled sequential data is much scarcer than labeled individual images. A recent work addressed this by using a unified network optimized for both seq2seq and im2im tasks, but the resulting sequential descriptors are heavily dependent on the individual descriptors trained on the im2im task. This paper proposes a decoupled joint image and sequence training (DJIST) framework, using a frozen backbone and two independent sequential branches, where one branch is supervised by both im2im and seq2seq losses and the other solely by the seq2seq loss. The feature reduction procedures for generating individual descriptors and sequential descriptors are further separated in the former branch. An attention separation loss is employed between the two branches, which forces them to focus on different parts of the images to produce more informative sequential descriptors. We retrain various existing seq2seq methods using the same backbone and two types of joint training strategies for a fair comparison. Extensive experimental results demonstrate that our proposed DJIST outperforms its original counterpart JIST by 3.9 % to 18.8 % across four benchmark test cases and achieves state-of-the-art Recall@1 scores against retrained baselines on three key benchmarks with robust cross-dataset generalization, negligible degradation under dimensionality reduction, and superior robustness against varying test-time sequence lengths. Code will be available at <span><span>https://github.com/shuimushan/DJIST</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131622"},"PeriodicalIF":6.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SiTrEx: Siamese transformer for feedback and posture correction on workout exercises SiTrEx：暹罗变压器，用于锻炼时的反馈和姿势纠正

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-30 DOI: 10.1016/j.neucom.2025.131703

Abdellah Sellam, Dounya Kassimi, Abdelhadi Djebana, Sara Mokhtari

{"title":"SiTrEx: Siamese transformer for feedback and posture correction on workout exercises","authors":"Abdellah Sellam, Dounya Kassimi, Abdelhadi Djebana, Sara Mokhtari","doi":"10.1016/j.neucom.2025.131703","DOIUrl":"10.1016/j.neucom.2025.131703","url":null,"abstract":"<div><div>Applying Machine Learning and Deep Learning techniques to sequences of Human Pose Landmarks to recognize workout exercises and count repetitions is widely studied in the computer vision literature. However, existing approaches suffer from two major problems. The first issue is that they lack the ability to provide detailed feedback on the postures performed by the athletes or provide feedback for a limited range of exercises using hand-designed rules and algorithms. The second problem is that these approaches consider only a predefined set of exercises and do not generalize to exercises outside their training data, which limits their usability. In this paper, we aim to address these two shortcomings by proposing a one-shot learning approach that utilizes Siamese Transformers to provide detailed feedback on individual human joints and can generalize to new exercises that are not present in the used dataset. The proposed configuration of the Siamese Transformer model deviates from its standard use in that it outputs a vector of similarity indicators rather than a single similarity score. Additionally, an accompanying binary classification Transformer model is used to assess the usefulness of different parts of the human pose for the input exercise without prior knowledge of the exercise itself. These properties allow the proposed approach to be used in general-purpose fitness applications and coach/athlete training platforms. The proposed approach achieved a 5-fold cross-validation test accuracy of <span><math><mn>94.4</mn><mspace></mspace><mi>%</mi><mo>±</mo><mn>0.8</mn></math></span> on the collected dataset.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131703"},"PeriodicalIF":6.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-supervised multi-blind network for real image denoising via multivariate Gaussian-poisson noise 基于多变量高斯泊松噪声的自监督多盲网络去噪

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-30 DOI: 10.1016/j.neucom.2025.131557

Hang Zhao, Zitong Wang, Xiaoli Zhang, Zhaojun Liu

{"title":"Self-supervised multi-blind network for real image denoising via multivariate Gaussian-poisson noise","authors":"Hang Zhao, Zitong Wang, Xiaoli Zhang, Zhaojun Liu","doi":"10.1016/j.neucom.2025.131557","DOIUrl":"10.1016/j.neucom.2025.131557","url":null,"abstract":"<div><div>The noise in real images exhibits more complex distributions than the synthetic noise and distinguishes across different scenarios. Furthermore, the scarcity of \"clean-to-noisy\" paired image datasets makes the current models difficult to denoise successfully. To address these challenges, we propose MGP-MBF<span><math><msup><mtext>M</mtext><mn>2</mn></msup></math></span>ANet, a self-supervised multi-blind feature multi-modulation attention network based on multivariate Gaussian-Poisson noise prior for real image denoising. Firstly, we propose a multivariate Gaussian-Poisson distribution to construct noisy images that contain more complex pixel spatial positions and intensity correlations, which expand the training domain and improve the model’s ability to generalize across diverse real noisy images. Building on this, we implement a random sampling mechanism based on four-neighborhood similarity to construct \"noise-noise\" training pairs, effectively exploiting the statistical properties of local structures in noisy images, without relying on any clean reference image. During the network design phase, a multi-blind feature multi-modulation attention module successfully enhances the representation of local features, which introduces multi-masked strategy to force network to learn more information to address the challenge of feature identity mapping. Experimental results demonstrate that the proposed method effectively suppresses noise and recovers high-frequency details within an unsupervised learning paradigm, achieving superior performance in both objective evaluation metrics and subjective visual quality across multiple real-world datasets.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131557"},"PeriodicalIF":6.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MPFBL: Modal pairing-based cross-fusion bootstrap learning for multimodal emotion recognition 基于模态配对的多模态情感识别交叉融合自举学习

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-30 DOI: 10.1016/j.neucom.2025.131577

Yong Zhang , Yongqing Liu , HongKai Li , Cheng Cheng , Ziyu Jia

{"title":"MPFBL: Modal pairing-based cross-fusion bootstrap learning for multimodal emotion recognition","authors":"Yong Zhang , Yongqing Liu , HongKai Li , Cheng Cheng , Ziyu Jia","doi":"10.1016/j.neucom.2025.131577","DOIUrl":"10.1016/j.neucom.2025.131577","url":null,"abstract":"<div><div>Multimodal emotion recognition (MER), a key technology in human-computer interaction, deciphers complex emotional states by integrating heterogeneous data sources such as text, audio, and video. However, previous works either retained only private information or focused solely on public information, resulting in a conflict between the strategies used in each approach. Existing methods often lose critical modality-specific attributes during feature extraction or struggle to align semantically divergent representations across modalities during fusion, resulting in incomplete emotional context modeling. To address these challenges, we propose the Modal Pairing-based Cross-Fusion Bootstrap Learning (MPFBL) framework, which integrates modal feature extraction, cross-modal bootstrap learning, and multi-modal cross-fusion into a unified approach. Firstly, the feature extraction module employs a Uni-Modal Transformer (UMT) and a Multi-Modal Transformer (MMT) to jointly capture modality-specific and modality-invariant information, addressing feature degradation in single-encoder paradigms, while alleviating inter-modal heterogeneity by explicitly distinguishing between modality-specific and shared representations. Subsequently, cross-modal bootstrap learning employs attention-guided optimization to align heterogeneous modalities and refine modality-specific representations, enhancing semantic consistency. Finally, a multi-modal cross-fusion network integrates convolutional mapping and adaptive attention to dynamically weight cross-modal dependencies, mitigating spatial-semantic misalignment induced by inter-modal heterogeneity in fusion processes. Extensive experimental results on CMU-MOSEI and CMU-MOSI demonstrate that MPFBL outperforms state-of-the-art methods, while ablation studies further confirm its effectiveness.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131577"},"PeriodicalIF":6.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A small-sample cross-domain bearing fault diagnosis method based on knowledge-enhanced domain adversarial learning 基于知识增强域对抗学习的小样本跨域轴承故障诊断方法

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-30 DOI: 10.1016/j.neucom.2025.131699

Peiming Shi , Yan Zhao , Xuefang Xu , Dongying Han

引用次数: 0