Neurocomputing最新文献

筛选
英文 中文
Distributed event-triggered sliding-mode control of second-order multi-UAV system with a dynamic leader
IF 5.5 2区 计算机科学
Neurocomputing Pub Date : 2025-04-13 DOI: 10.1016/j.neucom.2025.130189
Fuyin Yao , Hongwei Ren
{"title":"Distributed event-triggered sliding-mode control of second-order multi-UAV system with a dynamic leader","authors":"Fuyin Yao ,&nbsp;Hongwei Ren","doi":"10.1016/j.neucom.2025.130189","DOIUrl":"10.1016/j.neucom.2025.130189","url":null,"abstract":"<div><div>This article delves into the issue of formation control for multi-UAV system considering dynamic inputs. It introduces a distributed event-triggered sliding mode control design for second-order multi-UAV systems, grounded on leader-following principles. Initially, contemplate the utilization of sliding mode control methods to tackle disturbances and uncertainties within the formation system, ensuring robustness against external interference. Additionally, event-triggered strategies are employed, reducing unnecessary control actions and conserving energy. This minimizes the network communication load and enhances system response time. Finally, an appropriate consensus algorithm is proposed, designed on the basis of event-triggered control, to ensure the consistency tracking effect of each UAV. The proposed controller’s stability is confirmed by Lyapunov’s stability theory, showcasing its capability to stabilize the system. Additionally, a triggering scheme is introduced to efficiently reduce state updates and eliminate Zeno behavior. Lastly, simulations of a formation problem involving six quadrotor UAVs are presented, confirming the practicality of the theoretical findings.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"638 ","pages":"Article 130189"},"PeriodicalIF":5.5,"publicationDate":"2025-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature selections integrating algebraic and information perspectives in weighted incomplete neighborhood rough sets 加权不完全邻域粗糙集中代数与信息观点相结合的特征选择
IF 5.5 2区 计算机科学
Neurocomputing Pub Date : 2025-04-12 DOI: 10.1016/j.neucom.2025.130164
Shan Zhang, Jiucheng Xu, Qing Bai
{"title":"Feature selections integrating algebraic and information perspectives in weighted incomplete neighborhood rough sets","authors":"Shan Zhang,&nbsp;Jiucheng Xu,&nbsp;Qing Bai","doi":"10.1016/j.neucom.2025.130164","DOIUrl":"10.1016/j.neucom.2025.130164","url":null,"abstract":"<div><div>In data-driven scientific research and practical applications, data incompleteness and uncertainty are widespread issues that have become critical bottlenecks, restricting the accuracy of data analysis and the reliability of decision-making. Addressing the limitations of existing incomplete rough set models, which predominantly focus on uncertainty measurement adjustment while overlooking feature weighting and neighborhood relation construction, this paper proposes feature selection methods based on a weighted incomplete neighborhood rough set framework, integrating algebraic and information perspectives. Firstly, a weighted tolerance neighborhood relation is introduced to better quantify uncertainty, enhancing adaptability in classification and feature selection tasks. Secondly, from the algebraic perspective, three weighted measures are developed: weighted approximation accuracy, weighted information granularity, and weighted approximation precision based on information granularity. These measures are combined with information-theoretic metrics such as mutual information, complementary mutual information, and self-information to form nine fusion measures. Finally, a unified feature selection algorithmic framework is designed to comprehensively evaluate feature importance. Experimental results demonstrate that the proposed methods significantly improve classification accuracy across 12 datasets. Notably, under a 10% incompleteness rate, the GASI-FS, GMI-FS, and AMI-FS algorithms achieve classification accuracies of 87.31%, 85.87%, and 86.79% on KNN, CART, and SVM classifiers, respectively, outperforming other methods. These findings provide a robust theoretical foundation and practical tools for analyzing incomplete data in complex scenarios.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"639 ","pages":"Article 130164"},"PeriodicalIF":5.5,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143844494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Goal-driven navigation via variational sparse Q network and transfer learning
IF 5.5 2区 计算机科学
Neurocomputing Pub Date : 2025-04-12 DOI: 10.1016/j.neucom.2025.130191
Jiacheng Yao, Li He, Hongwei Wang, Wendong Xiao, Yangjun Du, Zhaoqing Lu, Yuxin Liao
{"title":"Goal-driven navigation via variational sparse Q network and transfer learning","authors":"Jiacheng Yao,&nbsp;Li He,&nbsp;Hongwei Wang,&nbsp;Wendong Xiao,&nbsp;Yangjun Du,&nbsp;Zhaoqing Lu,&nbsp;Yuxin Liao","doi":"10.1016/j.neucom.2025.130191","DOIUrl":"10.1016/j.neucom.2025.130191","url":null,"abstract":"<div><div>Compared to traditional map-based, goal-driven navigation methods, deep reinforcement learning (DRL)-based goal-driven navigation for mobile robots offers the advantage of not relying on prior map information, it enables autonomous decision-making through continuous interaction with the environment. However, DRL-based goal-driven navigation faces significant challenges in terms of low generalization ability and learning inefficiency. In this paper, we propose a DRL approach, variational sparsity Q network (VSQN), which leverages variational inference and transfer learning to achieve efficient goal-driven navigation. The variational inference framework models weight uncertainty within the network, thereby enhancing the agent’s generalization capability. Furthermore, a hierarchical learning network framework is adopted, and transfer learning is employed to incorporate prior knowledge from a pre-trained model into new navigation tasks. This enables the agent to rapidly adapt to novel tasks without the need for fine-tuning after selecting an optimal sub-goal. This improves the agent’s initial performance in previously unseen navigation tasks. The experimental results indicate that the proposed method achieves a success rate (SR) of 76% and a success weighted by inverse path length (SPL) of 0.52 in previously unencountered environments and target locations within the grid environment, and an SR of 81% with an SPL of 0.30 in the AI2-THOR environment. These findings demonstrate that the method substantially enhances the agent’s generalization capability.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"638 ","pages":"Article 130191"},"PeriodicalIF":5.5,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modality-aware contrast and fusion for multi-modal summarization
IF 5.5 2区 计算机科学
Neurocomputing Pub Date : 2025-04-12 DOI: 10.1016/j.neucom.2025.130094
Lixin Dai , Tingting Han , Zhou Yu , Jun Yu , Min Tan , Yang Liu
{"title":"Modality-aware contrast and fusion for multi-modal summarization","authors":"Lixin Dai ,&nbsp;Tingting Han ,&nbsp;Zhou Yu ,&nbsp;Jun Yu ,&nbsp;Min Tan ,&nbsp;Yang Liu","doi":"10.1016/j.neucom.2025.130094","DOIUrl":"10.1016/j.neucom.2025.130094","url":null,"abstract":"<div><div>Multimodal Summarization with Multi-modal Output (MSMO) is an emerging field focused on generating reliable and high-quality summaries by integrating various media types, such as text and video. Current methods primarily focus on integrating features from different modalities, but often overlook further enhancement and optimization of the fused features. This limitation can reduce the representational capacity of the fusion, ultimately diminishing overall performance. To address these challenges, a novel Modality-aware Contrast and Fusion (MCF) network has been proposed. This network leverages contrastive learning to preserve the integrity of modality-specific semantics while promoting the complementary integration of different media types. The Multi-Modal Attention (MMA) module captures temporal dependencies and learns discriminative semantics for individual media types through uni-modal semantic attention, while aligning and integrating semantics from multiple sources via cross-modal semantic attention. The Uni-Cross Contrastive Learning (UCC) module minimizes modality-aware contrastive losses to enhance the distinctiveness of semantic representations. The Modality-Aware Fusion (MAF) module dynamically adjusts the contributions of uni-modal and cross-modal outputs during the summarization process, optimizing the integration based on the strengths of each modality. Extensive validation on the Bliss, Daily Mail, and CNN datasets demonstrates the state-of-the-art performance of the MCF network and confirms the effectiveness of its components.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"639 ","pages":"Article 130094"},"PeriodicalIF":5.5,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143851545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bias-variance decomposition knowledge distillation for medical image segmentation
IF 5.5 2区 计算机科学
Neurocomputing Pub Date : 2025-04-12 DOI: 10.1016/j.neucom.2025.130230
Xiangchun Yu , Longxiang Teng , Zhongjian Duan , Dingwen Zhang , Wei Pang , Miaomiao Liang , Jian Zheng , Liujin Qiu , Qing Xu
{"title":"Bias-variance decomposition knowledge distillation for medical image segmentation","authors":"Xiangchun Yu ,&nbsp;Longxiang Teng ,&nbsp;Zhongjian Duan ,&nbsp;Dingwen Zhang ,&nbsp;Wei Pang ,&nbsp;Miaomiao Liang ,&nbsp;Jian Zheng ,&nbsp;Liujin Qiu ,&nbsp;Qing Xu","doi":"10.1016/j.neucom.2025.130230","DOIUrl":"10.1016/j.neucom.2025.130230","url":null,"abstract":"<div><div>Knowledge distillation essentially maximizes the mutual information between teacher and student networks. Typically, a variational distribution is introduced to maximize the variational lower bound. However, the heteroscedastic noises derived from this distribution are often unstable, leading to unreliable data-uncertainty modeling. Our research identifies that bias-variance coupling in knowledge distillation causes this instability. We thus propose Bias-variance dEcomposition kNowledge dIstillatioN (BENIN) approach. Initially, we use bias-variance decomposition to decouple these components. Subsequently, we design a lightweight Feature Frequency Expectation Estimation Module (FF-EEM) to estimate the student's prediction expectation, which helps compute bias and variance. Variance learning measures data uncertainty in the teacher's prediction. A balance factor addresses the bias-variance dilemma. Lastly, the bias-variance decomposition distillation loss enables the student to learn valuable knowledge while reducing noise. Experiments on Synapse and Lits17 medical-image-segmentation datasets validate BENIN's effectiveness. FF-EEM also mitigates high-frequency noise from high mask rates, enhancing data-uncertainty estimation and visualization. Our code is available at <span><span>https://github.com/duanzhongjian/BENIN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"638 ","pages":"Article 130230"},"PeriodicalIF":5.5,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143829581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual filtration subdomain adaptation network for cross-subject EEG emotion recognition
IF 5.5 2区 计算机科学
Neurocomputing Pub Date : 2025-04-12 DOI: 10.1016/j.neucom.2025.130254
Qingshan She , Yipeng Li , Yun Chen , Ming Meng , Su Liu , Yingchun Zhang
{"title":"Dual filtration subdomain adaptation network for cross-subject EEG emotion recognition","authors":"Qingshan She ,&nbsp;Yipeng Li ,&nbsp;Yun Chen ,&nbsp;Ming Meng ,&nbsp;Su Liu ,&nbsp;Yingchun Zhang","doi":"10.1016/j.neucom.2025.130254","DOIUrl":"10.1016/j.neucom.2025.130254","url":null,"abstract":"<div><div>Emotion recognition based on electroencephalogram (EEG) data holds pivotal importance for advancing affective brain-computer interfaces. However, in cross-subject emotion recognition scenarios, negative transfer is likely to happen due to EEG’s individual differences and inherent temporal variability. To solve these issues, this study proposes a novel domain adaptation architecture, named dual filtration subdomain adaptation network (DFSAN), to mitigate negative transfer and align subdomain features at a fine-grained category level. Firstly, the transferability of each subject was assessed to identify those with high transferability to serve as source domains. Then, with the feature alignment through subdomain metric learning, the transferable features could be obtained by dual filtration network. Finally, dual classifiers were employed to mitigate misclassifications near the decision boundary and output the recognition results. Multi-source cross-subject emotion recognition experiments were executed with SEED, SEED-IV, DEAP and SEED-V datasets, achieving recognition accuracy of 88.68 %, 67.61 %, 65.33 % and 65.57 %, respectively. Compared with other state-of-the-art domain adaptation methods, our proposed method achieved better results in cross-subject emotion recognition tasks, demonstrating the effectiveness and feasibility of DFSAN in handling negative transfer under multi-source transfer emotion recognition.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"639 ","pages":"Article 130254"},"PeriodicalIF":5.5,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143844490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physiological navigation amplifier for remote extracting PPG signals from face video clips
IF 5.5 2区 计算机科学
Neurocomputing Pub Date : 2025-04-11 DOI: 10.1016/j.neucom.2025.130228
Bin Li , Wei Zhang , Hong Fu
{"title":"Physiological navigation amplifier for remote extracting PPG signals from face video clips","authors":"Bin Li ,&nbsp;Wei Zhang ,&nbsp;Hong Fu","doi":"10.1016/j.neucom.2025.130228","DOIUrl":"10.1016/j.neucom.2025.130228","url":null,"abstract":"<div><div>Remote photoplethysmography (rPPG) signal measurement based on face video clips is one of the essential methods for non-contact monitoring of human heart health. However, the subtle variations in face color associated with the rPPG signal can be easily contaminated by external noise, leading to a limited cross-domain generalization of existing models. In the context of short face video clips, we have observed that the influence of factors such as background, illumination, and other variables results in significant variation in noise between samples, while those within the sample typically exhibit similar patterns. Therefore, those distinctive features can be utilized to model noise variation. This paper presents a physiological navigation amplifier with a self-adaptive differential feature representation structure for rPPG signal measurement in challenging video clips. First, a noise representation encoder is proposed for self-adaptive differential feature representation. Second, a physiological navigation amplifier is designed to extract the subtle rPPG signal from facial feature sets by disentangling rPPG and noise features in the spatiotemporal space. Finally, to solve the problem of measured signal degradation caused by drastic external disturbances that result in network performance decline, a learnable signal rectification matrix is employed to reconstruct the measured rPPG signal. The experimental results, obtained through publicly available intra-datasets and cross-dataset validation, demonstrate that our proposed method reduces the mean absolute error (MAE) in rPPG signal and heart rate measurements by more than 17 % (OBF), 13 % (COHFACE), and 24 % (UBFC) compared to state-of-the-art methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"639 ","pages":"Article 130228"},"PeriodicalIF":5.5,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143844492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recursively summarizing enables long-term dialogue memory in large language models
IF 5.5 2区 计算机科学
Neurocomputing Pub Date : 2025-04-11 DOI: 10.1016/j.neucom.2025.130193
Qingyue Wang , Yanhe Fu , Yanan Cao , Shuai Wang , Zhiliang Tian , Liang Ding
{"title":"Recursively summarizing enables long-term dialogue memory in large language models","authors":"Qingyue Wang ,&nbsp;Yanhe Fu ,&nbsp;Yanan Cao ,&nbsp;Shuai Wang ,&nbsp;Zhiliang Tian ,&nbsp;Liang Ding","doi":"10.1016/j.neucom.2025.130193","DOIUrl":"10.1016/j.neucom.2025.130193","url":null,"abstract":"<div><div>Recently, large language models (LLMs), such as GPT-4, stand out remarkable conversational abilities, enabling them to engage in dynamic and contextually relevant dialogues across a wide range of topics. However, in a long-term conversation, these chatbots fail to recall appropriate information from the past, resulting in inconsistent responses. To address this, we propose to recursively generate summaries/ memory using large language models to enhance their long-term dialog ability. Specifically, our method first stimulates the LLM to memorize small dialogue contexts. After that, the LLM recursively produces new memory using previous old memory and subsequent contexts. Finally, the chatbot is prompted to generate a response based on the latest memory. The experiments on widely used LLMs show that our method generates more consistent responses in long-term conversations, and it can be significantly enhanced with just two/ three dialog illustrations. Also, we find that our strategy could nicely complement both large context windows (<em>e.g</em>., 8K and 16K) and retrieval-enhanced LLMs, bringing further long-term dialogue performance. Notably, our method is a potential solution to enable the LLM to model the extremely long dialog context. The code will be released later.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"639 ","pages":"Article 130193"},"PeriodicalIF":5.5,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143851634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Remaining useful life prediction with uncertainty quantification for rotating machinery: A method based on explainable variational deep gaussian process
IF 5.5 2区 计算机科学
Neurocomputing Pub Date : 2025-04-11 DOI: 10.1016/j.neucom.2025.130232
Xiuli Liu , Shuo Cui , Wan Qiao , Jianyu Liu , Guoxin Wu
{"title":"Remaining useful life prediction with uncertainty quantification for rotating machinery: A method based on explainable variational deep gaussian process","authors":"Xiuli Liu ,&nbsp;Shuo Cui ,&nbsp;Wan Qiao ,&nbsp;Jianyu Liu ,&nbsp;Guoxin Wu","doi":"10.1016/j.neucom.2025.130232","DOIUrl":"10.1016/j.neucom.2025.130232","url":null,"abstract":"<div><div>Remaining Useful Life (RUL) prediction is one of the key technologies to ensure the safety and reliability of mechanical equipment. To address the challenges of low prediction accuracy and insufficient uncertainty quantification in rotating machinery monitoring, this paper proposes an innovative Variational Deep Gaussian Process (VDGP) method. The proposed method adopts an Adaptive Inter-layer Variational Inference (AIVI) strategy, integrating inter-layer dependency modeling, adaptive inducing point optimization, and hierarchical variational lower bound, enhancing information transfer and feature extraction in multi-layer structures while reducing redundant computation and improving training efficiency. Through Shapley Additive exPlanations (SHAP) analysis and correlation analysis between input features and hidden layer node outputs, the key factors in RUL prediction are deeply explored. The VDGP method is validated using the C-MAPSS dataset and a wind turbine planetary gearbox dataset, and it is thoroughly compared with classical methods and state-of-the-art methods from the past three years. Experimental results show that the VDGP method not only achieves superior prediction accuracy but also effectively quantifies prediction uncertainty. Additionally, through SHAP analysis and correlation analysis between input features and hidden layer node outputs, low-contribution features are identified and removed, significantly reducing the number of required sensors and deployment costs. This improves computational efficiency and real-time performance, providing strong technical support for equipment health evaluation, sensor deployment optimization in industrial applications, cost reduction, and enhanced real-time monitoring and decision-making efficiency.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"638 ","pages":"Article 130232"},"PeriodicalIF":5.5,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
End-to-end video object detection based on dynamic anchor box spatiotemporal decoder and hybrid matching
IF 5.5 2区 计算机科学
Neurocomputing Pub Date : 2025-04-11 DOI: 10.1016/j.neucom.2025.130177
Zhe Fu , Yuan Shuo , Pengjun Cao , Jing Wei , Heng Wang , Gaoxiang Zhang
{"title":"End-to-end video object detection based on dynamic anchor box spatiotemporal decoder and hybrid matching","authors":"Zhe Fu ,&nbsp;Yuan Shuo ,&nbsp;Pengjun Cao ,&nbsp;Jing Wei ,&nbsp;Heng Wang ,&nbsp;Gaoxiang Zhang","doi":"10.1016/j.neucom.2025.130177","DOIUrl":"10.1016/j.neucom.2025.130177","url":null,"abstract":"<div><div>Despite the significant progress in object detection using single-frame images, when dealing with continuously changing video scenes, relying solely on information from individual frames often fails to fully exploit the dynamic continuity and consistency across the temporal dimension. Designing a model structure that can capture local details while also understanding the global spatiotemporal context remains a challenging problem. To address this, this paper proposes an end-to-end video object detection algorithm based on dynamic anchor box spatiotemporal decoder and hybrid matching (DAHM-Net). Specifically, a dynamic anchor box spatiotemporal decoder is proposed, where each decoder layer iteratively updates the position priors that account for object scales, enabling the model to better learn object features and improve detection performance. Additionally, a hybrid matching training strategy is proposed, combining one-to-one and one-to-many matching to enhance model performance and accelerate convergence, while maintaining end-to-end detection. Experimental results show that the proposed method outperforms TransVOD by 0.7, 1.0, and 0.5 AP<sub>50</sub> on three public datasets, and also surpasses recent state-of-the-art approaches.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"639 ","pages":"Article 130177"},"PeriodicalIF":5.5,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143844489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信