Coherence-aware and snap-triggered: A novel mechanism for audio-visual cooperative tasks

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2026-06-01 Epub Date: 2026-02-07 DOI:10.1016/j.eswa.2026.131559

Cunhan Guo, Heyan Huang, Ruiqi Hu, Danjie Han

{"title":"Coherence-aware and snap-triggered: A novel mechanism for audio-visual cooperative tasks","authors":"Cunhan Guo, Heyan Huang, Ruiqi Hu, Danjie Han","doi":"10.1016/j.eswa.2026.131559","DOIUrl":null,"url":null,"abstract":"<div><div>Audio-Visual Cooperative tasks underpin multimodal scene understanding and compel models to reconcile continuous temporal evolution with abrupt sensory transitions. We propose the Coherence-Aware and Snap-Triggered mechanism (CAST) mechanism, a plug-in temporal refinement layer without perturbing backbone parameters or demanding additional modalities. The Exponential Memory based Coherence-Aware module attenuates distant frame contributions through an exponentially decaying weight envelope, thereby preventing the persistent influence of obsolete disruptions. Complementarily, the Optical Flow based Snap-Triggered Module module registers instantaneous motion discontinuities and reallocates attention toward nascent events. Operating in concert, these modules yield a representation that remains coherent across smooth transitions yet responsive to sudden perturbations. Empirical evaluation across multiple AVC benchmarks demonstrates consistent superiority over established baselines, corroborating that CAST enhances temporal fidelity and, by extension, the reliability of downstream multimodal decisions.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"313 ","pages":"Article 131559"},"PeriodicalIF":7.5000,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417426004720","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Audio-Visual Cooperative tasks underpin multimodal scene understanding and compel models to reconcile continuous temporal evolution with abrupt sensory transitions. We propose the Coherence-Aware and Snap-Triggered mechanism (CAST) mechanism, a plug-in temporal refinement layer without perturbing backbone parameters or demanding additional modalities. The Exponential Memory based Coherence-Aware module attenuates distant frame contributions through an exponentially decaying weight envelope, thereby preventing the persistent influence of obsolete disruptions. Complementarily, the Optical Flow based Snap-Triggered Module module registers instantaneous motion discontinuities and reallocates attention toward nascent events. Operating in concert, these modules yield a representation that remains coherent across smooth transitions yet responsive to sudden perturbations. Empirical evaluation across multiple AVC benchmarks demonstrates consistent superiority over established baselines, corroborating that CAST enhances temporal fidelity and, by extension, the reliability of downstream multimodal decisions.

查看原文本刊更多论文

连贯感知和快照触发：一种新的视听合作任务机制

视听合作任务支持多模态场景理解，并迫使模型协调连续的时间演变与突然的感觉转变。我们提出了一致性感知和快照触发机制（CAST）机制，这是一种不干扰骨干参数或要求额外模式的插件时间优化层。基于指数内存的相干感知模块通过指数衰减权重包络来衰减远端帧贡献，从而防止过时中断的持续影响。此外，基于光流的快照触发模块模块记录瞬时运动不连续并将注意力重新分配给新生事件。这些模块协同工作，产生了一种表示，在平稳过渡期间保持连贯，但对突然的扰动做出反应。对多个AVC基准的实证评估表明，CAST优于已建立的基线，证实了CAST提高了时间保真度，进而提高了下游多式联运决策的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.