基于最优特征驱动的混合注意网络的有效视频摘要

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-06-25 DOI:10.1016/j.engappai.2025.111211

Habib Khan , Samee Ullah Khan , Waseem Ullah , Sung Wook Baik

{"title":"基于最优特征驱动的混合注意网络的有效视频摘要","authors":"Habib Khan , Samee Ullah Khan , Waseem Ullah , Sung Wook Baik","doi":"10.1016/j.engappai.2025.111211","DOIUrl":null,"url":null,"abstract":"<div><div>Video summarization (VS) has emerged as an effective method for extracting meaningful content from large video repositories. Recent advancements in visual intelligence have significantly improved the ability to summarize lengthy, raw videos into concise yet representative content. However, existing VS methods extract features from static GoogleNet Pool5 without empirical analysis at an early stage. Moreover, these approaches often lack mechanisms to jointly refine channel-wise and spatial-wise feature interactions, resulting in inadequate learning of complex visual semantics and ultimately suboptimal summarization performance. To address these limitations, we propose the Hybrid-Attention VS Network (HAVSNet), which conceptualizes VS as a keyframe selection task. Our method integrates representative intermediate features early in the network, significantly enhancing feature representation compared to conventional techniques. Furthermore, HAVSNet incorporates a hybrid-attention mechanism for advanced feature refinement: channel attention highlights the most discriminative feature maps, while an optimized spatial attention module captures and refines spatial dependencies. This enables the network to focus on the most informative and visually salient regions. Additionally, explainable AI (XAI) via heatmap visualizations further enhances interpretability by revealing how the model prioritizes salient regions, offering insights into the focus of the model and optimal feature selection. Extensive experiments demonstrate that our network outperforms state-of-the-art methods, achieving notable improvements. Comprehensive quantitative and qualitative analyses further confirm the effectiveness of the proposed network. Moreover, the proposed HAVSNet is evaluated across three training configurations of canonical, augmented, and transfer settings, showing its strong generalization and adaptability.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"158 ","pages":"Article 111211"},"PeriodicalIF":8.0000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal features driven hybrid attention network for effective video summarization\",\"authors\":\"Habib Khan , Samee Ullah Khan , Waseem Ullah , Sung Wook Baik\",\"doi\":\"10.1016/j.engappai.2025.111211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Video summarization (VS) has emerged as an effective method for extracting meaningful content from large video repositories. Recent advancements in visual intelligence have significantly improved the ability to summarize lengthy, raw videos into concise yet representative content. However, existing VS methods extract features from static GoogleNet Pool5 without empirical analysis at an early stage. Moreover, these approaches often lack mechanisms to jointly refine channel-wise and spatial-wise feature interactions, resulting in inadequate learning of complex visual semantics and ultimately suboptimal summarization performance. To address these limitations, we propose the Hybrid-Attention VS Network (HAVSNet), which conceptualizes VS as a keyframe selection task. Our method integrates representative intermediate features early in the network, significantly enhancing feature representation compared to conventional techniques. Furthermore, HAVSNet incorporates a hybrid-attention mechanism for advanced feature refinement: channel attention highlights the most discriminative feature maps, while an optimized spatial attention module captures and refines spatial dependencies. This enables the network to focus on the most informative and visually salient regions. Additionally, explainable AI (XAI) via heatmap visualizations further enhances interpretability by revealing how the model prioritizes salient regions, offering insights into the focus of the model and optimal feature selection. Extensive experiments demonstrate that our network outperforms state-of-the-art methods, achieving notable improvements. Comprehensive quantitative and qualitative analyses further confirm the effectiveness of the proposed network. Moreover, the proposed HAVSNet is evaluated across three training configurations of canonical, augmented, and transfer settings, showing its strong generalization and adaptability.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"158 \",\"pages\":\"Article 111211\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625012126\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625012126","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

视频摘要（VS）是一种从大型视频库中提取有意义内容的有效方法。视觉智能的最新进展显著提高了将冗长的原始视频总结为简洁但具有代表性的内容的能力。然而，现有的VS方法从静态GoogleNet Pool5中提取特征，早期没有进行实证分析。此外，这些方法通常缺乏共同改进通道和空间特征交互的机制，导致对复杂视觉语义的学习不足，最终导致总结性能不佳。为了解决这些限制，我们提出了混合注意力VS网络（HAVSNet），它将VS概念化为关键帧选择任务。我们的方法在网络早期集成了代表性的中间特征，与传统技术相比，显著增强了特征表示。此外，HAVSNet还集成了一种混合注意机制，用于高级特征细化：通道注意突出最具区别性的特征映射，而优化的空间注意模块捕获并细化空间依赖性。这使得网络能够专注于信息量最大、视觉上最突出的区域。此外，通过热图可视化的可解释AI （XAI）通过揭示模型如何优先考虑突出区域，提供对模型焦点和最佳特征选择的见解，进一步增强了可解释性。大量的实验表明，我们的网络优于最先进的方法，取得了显著的改进。全面的定量和定性分析进一步证实了该网络的有效性。此外，本文提出的HAVSNet在规范化、增强和迁移设置三种训练配置下进行了评估，显示了其较强的泛化和适应性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimal features driven hybrid attention network for effective video summarization

Video summarization (VS) has emerged as an effective method for extracting meaningful content from large video repositories. Recent advancements in visual intelligence have significantly improved the ability to summarize lengthy, raw videos into concise yet representative content. However, existing VS methods extract features from static GoogleNet Pool5 without empirical analysis at an early stage. Moreover, these approaches often lack mechanisms to jointly refine channel-wise and spatial-wise feature interactions, resulting in inadequate learning of complex visual semantics and ultimately suboptimal summarization performance. To address these limitations, we propose the Hybrid-Attention VS Network (HAVSNet), which conceptualizes VS as a keyframe selection task. Our method integrates representative intermediate features early in the network, significantly enhancing feature representation compared to conventional techniques. Furthermore, HAVSNet incorporates a hybrid-attention mechanism for advanced feature refinement: channel attention highlights the most discriminative feature maps, while an optimized spatial attention module captures and refines spatial dependencies. This enables the network to focus on the most informative and visually salient regions. Additionally, explainable AI (XAI) via heatmap visualizations further enhances interpretability by revealing how the model prioritizes salient regions, offering insights into the focus of the model and optimal feature selection. Extensive experiments demonstrate that our network outperforms state-of-the-art methods, achieving notable improvements. Comprehensive quantitative and qualitative analyses further confirm the effectiveness of the proposed network. Moreover, the proposed HAVSNet is evaluated across three training configurations of canonical, augmented, and transfer settings, showing its strong generalization and adaptability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.