Neurocomputing最新文献_第6页

TMAN: A temporal multimodal attention network for backchannel detection TMAN：一种用于反向通道检测的时间多模态注意网络

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-26 DOI: 10.1016/j.neucom.2025.131605

Kangzhong Wang , Xinwei Zhai , M.K.Michael Cheung , Eugene Yujun Fu , Peter Qi Chen , Grace Ngai , Hong Va Leong

{"title":"TMAN: A temporal multimodal attention network for backchannel detection","authors":"Kangzhong Wang , Xinwei Zhai , M.K.Michael Cheung , Eugene Yujun Fu , Peter Qi Chen , Grace Ngai , Hong Va Leong","doi":"10.1016/j.neucom.2025.131605","DOIUrl":"10.1016/j.neucom.2025.131605","url":null,"abstract":"<div><div>Backchannel responses play an essential role in human communication, which are often expressed by listeners to show their attention and engagement to speakers without interrupting their speech. Their automatic detection is crucial for developing conversational AI agents that engage in human-like, responsive communication. Backchanneling can be conveyed via a combination of various non-verbal cues, such as head nodding and facial expressions. However, these cues are often subtle, brief and sparse during conversations, posing significant challenge in the accurate detection of backchannel responses. This study introduces TMAN, a sequential three-stage multimodal temporal network designed to effectively encode behavioral features from four human visual modalities. It incorporates three attention modules to encode subtle “micro” actions, such as specific gestures or facial expressions, that occur at each frame, as well as temporal “macro” behavior patterns, such as sustained body and head movements, into a final representation for backchannel detection. These are often expressed in backchannel responses, thereby enhancing the detection capabilities. Comprehensive experiments conducted on two public datasets demonstrate that TMAN significantly enhances performance and achieves state-of-the-art results. Extensive ablation studies validate the contribution of each attention module and visual modality employed in our model, and identify the appropriate feature transformation and implementation setups for effective backchannel detection. An in-depth investigation of the model inference process further demonstrates the effectiveness of TMAN attention modules, particularly in processing both “micro” and temporal “macro” behavior patterns in multimodal visual cues.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"657 ","pages":"Article 131605"},"PeriodicalIF":6.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

COTA-motion: Controllable image-to-video synthesis with dense semantic trajectories COTA-motion：具有密集语义轨迹的可控图像到视频合成

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-26 DOI: 10.1016/j.neucom.2025.131671

Yirui Chen , Wenqing Chu , Ye Wu , Jie Yang , Xiaonan Mao , Wei Liu

{"title":"COTA-motion: Controllable image-to-video synthesis with dense semantic trajectories","authors":"Yirui Chen , Wenqing Chu , Ye Wu , Jie Yang , Xiaonan Mao , Wei Liu","doi":"10.1016/j.neucom.2025.131671","DOIUrl":"10.1016/j.neucom.2025.131671","url":null,"abstract":"<div><div>Motion transfer, which aims to animate an object in a static image by transferring motion from a reference video, remains a fundamental yet challenging task in content creation. While recent diffusion-based image-to-video models offer fine-grained control over visual appearance, most existing methods rely on ambiguous text prompts or coarse drag-based motion cues, making it difficult to achieve accurate and consistent motion synthesis. To address these limitations, we propose COTA-Motion, a general framework for controllable image-to-video motion transfer. Our method leverages a dense trajectory-based semantic representation extracted from the driving video to provide explicit motion guidance. Specifically, we segment the salient object and extract its point-wise trajectories across frames. These trajectories are enriched with semantic embeddings and reprojected into a spatial-temporal tensor, forming the motion embedding. To utilize this motion representation, we introduce the COTA Adapter, which integrates image content with semantic trajectories via cross-attention, enabling accurate and flexible control over the generated motion. At inference, we further incorporate an alignment module to address discrepancies between the input image and motion cues, ensuring spatial consistency. Built upon a pre-trained video diffusion model, COTA-Motion only requires lightweight fine-tuning on a small set of videos, and it enables high-quality, controllable motion transfer from video to image. Extensive experiments demonstrate the effectiveness of our approach in generating visually coherent and motion-aligned video outputs.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"657 ","pages":"Article 131671"},"PeriodicalIF":6.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tabular data generation models: An in-depth survey and performance benchmarks with extensive tuning 表格数据生成模型：深入调查和广泛调优的性能基准

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-26 DOI: 10.1016/j.neucom.2025.131655

G.Charbel N. Kindji , Lina M. Rojas-Barahona , Elisa Fromont , Tanguy Urvoy

引用次数: 0

StreamVAD: A streaming framework with progressive context integration for multi-temporal scale video anomaly detection 基于渐进式上下文集成的流媒体框架，用于多时间尺度视频异常检测

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-26 DOI: 10.1016/j.neucom.2025.131669

Lijun Han, Gang Liang, Pengcheng Wang, Dingming Liu, Kui Zhao

{"title":"StreamVAD: A streaming framework with progressive context integration for multi-temporal scale video anomaly detection","authors":"Lijun Han, Gang Liang, Pengcheng Wang, Dingming Liu, Kui Zhao","doi":"10.1016/j.neucom.2025.131669","DOIUrl":"10.1016/j.neucom.2025.131669","url":null,"abstract":"<div><div>Video anomaly detection (VAD) plays a crucial role in intelligent surveillance systems by identifying abnormal events in video streams. However, most existing methods either rely on isolated feature extraction—failing to model inter-action contextual relationships critical for complex anomaly recognition—or demand full-video processing via graph/hierarchical architectures, which incur high latency, computational burden, and parameter/memory inefficiency with depth. Lightweight designs mitigate costs but sacrifice temporal sensitivity through shallow networks and short-clip inputs, limiting detection of subtle or multi-scale anomalies in streaming scenarios. To address these challenges, we propose StreamVAD, a lightweight streaming anomaly detection framework that achieves low-latency, long-term temporal modeling with minimal computational overhead. A Key Clip Generator (KCG) filters redundant inputs in a streaming manner, allowing the model to focus on informative content while reducing computational cost. A progressive context integration (PCI) module incrementally expands the temporal receptive field by integrating historical context without full-sequence buffering, enabling efficient detection of complex long-term anomalies. Additionally, a multi-scale temporal selection (MTS) strategy dynamically adapts temporal resolution to capture both short- and long-term abnormalities. Extensive experiments on UCF-Crime, XD-Violence, and a supplemental long-term anomaly dataset demonstrate that StreamVAD achieves effective video anomaly detection with fewer parameters and lower latency. The code and dataset are available at <span><span>https://github.com/Han-lijun/StreamVAD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131669"},"PeriodicalIF":6.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mixtures of posterior and prior variational autoencoders for representation learning and cluster analysis in latent space 基于后验和先验变分自编码器的潜在空间表示学习和聚类分析

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-26 DOI: 10.1016/j.neucom.2025.131524

Mashfiqul Huq Chowdhury , Yuichi Hirose , Stephen Marsland , Yuan Yao

引用次数: 0

Algorithmically-designed reward shaping for multiagent reinforcement learning in navigation 导航中多智能体强化学习的算法设计奖励塑造

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-26 DOI: 10.1016/j.neucom.2025.131654

Ifrah Saeed , Andrew C. Cullen , Zainab Zaidi , Sarah Erfani , Tansu Alpcan

{"title":"Algorithmically-designed reward shaping for multiagent reinforcement learning in navigation","authors":"Ifrah Saeed , Andrew C. Cullen , Zainab Zaidi , Sarah Erfani , Tansu Alpcan","doi":"10.1016/j.neucom.2025.131654","DOIUrl":"10.1016/j.neucom.2025.131654","url":null,"abstract":"<div><div>The practical applicability of multiagent reinforcement learning is hindered by its low sample efficiency and slow learning speed. While reward shaping and expert guidance can partially mitigate these challenges, their efficiency is offset by the need for substantial manual effort. To address these constraints, we introduce Multiagent Environment-aware semi-Automated Guide (MEAG), a novel framework that leverages widely known, highly efficient, and low-resolution single-agent pathfinding algorithms for shaping rewards to guide multiagent reinforcement learning agents. MEAG uses these single-agent solvers over a coarse-grid surrogate that requires minimal manual intervention, and guides agents away from random exploration in a manner that significantly reduces computational costs. When tested across a range of densely and sparsely connected multiagent navigation environments, MEAG consistently outperforms state-of-the-art algorithms, achieving up to <span><math><mn>50</mn><mspace></mspace><mi>%</mi></math></span> faster convergence and <span><math><mn>20</mn><mspace></mspace><mi>%</mi></math></span> higher rewards. These improvements enable the consideration of MARL for more complex real-world pathfinding applications ranging from warehouse automation to search and rescue operations, and swarm robotics.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131654"},"PeriodicalIF":6.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance-barrier-based event-triggered leader–follower consensus control for nonlinear multi-agent systems 非线性多智能体系统基于性能障碍的事件触发领导-追随者共识控制

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-26 DOI: 10.1016/j.neucom.2025.131664

Song Gao , Jin-Liang Wang , Shun-Yan Ren , Bei Peng

引用次数: 0

MSTGT: Multi-scale spatio-temporal guidance for visual tracking MSTGT：面向视觉跟踪的多尺度时空制导

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-26 DOI: 10.1016/j.neucom.2025.131583

Fei Pan , Lianyu Zhao , Chenglin Wang , Chunlei Du , Xiaolei Zhao

{"title":"MSTGT: Multi-scale spatio-temporal guidance for visual tracking","authors":"Fei Pan , Lianyu Zhao , Chenglin Wang , Chunlei Du , Xiaolei Zhao","doi":"10.1016/j.neucom.2025.131583","DOIUrl":"10.1016/j.neucom.2025.131583","url":null,"abstract":"<div><div>Addressing the challenge of target tracking in complex scenarios with limited data samples is a highly significant research endeavor. Nevertheless, most trackers primarily concentrate on intricate model architectures or template updating strategies, overlooking the depth of training sample exploitation and the efficient utilization of spatio-temporal target information. To alleviate the above problem, we propose a novel visual tracking framework tailored for complex scenarios, named MSTGT, which integrates mixed data sampling with multi-scale spatio-temporal guidance. Specifically, we innovatively employ a video sequence sampling and feature mixing strategy to simulate complex scenarios, enhancing the representation of video sequences. Concurrently, our multi-scale visual cue encoder harnesses multi-scale target information to fortify feature representation and cue construction. Furthermore, our multi-scale spatio-temporal guidance encoder, a groundbreaking approach, seamlessly integrates spatial and temporal dimensions with multi-scale information, precisely guiding the prediction of target trajectories. This not only bolsters the handling of intricate motion patterns but also circumvents the need for intricate online updating strategies. MSTGT achieves SOTA performance on six benchmarks, while running at real-time speed. Code is available at <span><span>https://github.com/capf-2011/MSTGT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"657 ","pages":"Article 131583"},"PeriodicalIF":6.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-feature interactive temporal knowledge graph reasoning with evolving retention mechanism 具有进化保留机制的跨特征交互时态知识图推理

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-26 DOI: 10.1016/j.neucom.2025.131663

Ying Cui, Xiao Song, Yishi Liu, Ming Liu

{"title":"Cross-feature interactive temporal knowledge graph reasoning with evolving retention mechanism","authors":"Ying Cui, Xiao Song, Yishi Liu, Ming Liu","doi":"10.1016/j.neucom.2025.131663","DOIUrl":"10.1016/j.neucom.2025.131663","url":null,"abstract":"<div><div>Temporal knowledge graph (TKG) reasoning emphasizes deducing absent connections within evolving knowledge graphs (KGs), which is essential for comprehending dynamic engineering informatics. However, the ongoing dynamic evolution of TKGs presents significant challenges for accurate predictions. To address this challenge, this paper proposes a cross-feature temporal evolution network (CFTENet), which designs an evolving retention mechanism establishing a knowledge forgetting threshold to lock in snapshots of continuous evolution. The importance of knowledge gradually diminishes until the information becomes outdated and is completely forgotten. Historical information at previous time points is preserved in current snapshot to simulate continuous dynamic evolution of knowledge. Moreover, CFTENet incorporates a cross-feature interaction module, leveraging a multilayer dilated convolutional network and a residual network to grasp cross-feature intricate interactions among and across entity and relation characteristics. The proposed model improves the reasoning ability and resilience to unseen data. Comprehensive testing on four benchmark datasets (ICEWS14, ICEWS18, GDELT, WIKI) demonstrates that our model achieves significant performance improvements, surpassing the baseline methods by 1.5 %, 8.8 %, 6.5 %, and 2.2 %, which highlights its effectiveness in TKG reasoning.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131663"},"PeriodicalIF":6.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A neuromorphic binocular framework fusing directional and depth motion cues towards precise collision prediction 一个神经形态双目框架融合方向和深度运动线索，以实现精确的碰撞预测

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-26 DOI: 10.1016/j.neucom.2025.131660

Chuankai Fang , Haoting Zhou, Renyuan Liu, Qinbing Fu

{"title":"A neuromorphic binocular framework fusing directional and depth motion cues towards precise collision prediction","authors":"Chuankai Fang , Haoting Zhou, Renyuan Liu, Qinbing Fu","doi":"10.1016/j.neucom.2025.131660","DOIUrl":"10.1016/j.neucom.2025.131660","url":null,"abstract":"<div><div>Biological studies have significantly advanced our understanding of collision detection, driving improvements in visual systems for safer navigation of mobile intelligent machines. Directionally selective neurons (DSNs), extensively studied in insects like locusts and flies, have inspired computational models that effectively detect specific directional motion cues with low computational demands, making them suitable for real-time applications. Despite these advancements, there remains a gap between biological systems and current computational models. Typically, monocular computational approaches project the three-dimensional world onto two-dimensional representations, resulting in the loss of critical depth information essential for accurately detecting looming objects, i.e., those directly approaching the observer. Consequently, such methods often suffer interference from background motion distractors and nearby translating objects. To address these limitations, we developed a binocular visual framework integrating neuromorphic components, including directionally selective neural networks and depth-disparity computing pathway. This binocular approach enhances looming detection accuracy and improves collision prediction capabilities. Additionally, evolutionary learning techniques were employed to optimize network structures and parameters, prioritizing robustness across diverse real-world scenarios. The resulting binocular model selectively responds to imminent collision trajectories while effectively suppressing peripheral distractors such as near-miss and passing movements. We conducted comprehensive evaluations comparing our proposed framework against a latest binocular neural model across various complex scenarios. Systematic ablation studies further validated the effectiveness and robustness of our approach. The results confirm its potential for deployment in mobile robots and autonomous vehicles, assisting their collision avoidance in real-world applications.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"657 ","pages":"Article 131660"},"PeriodicalIF":6.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0