{"title":"基于最优特征驱动的混合注意网络的有效视频摘要","authors":"Habib Khan , Samee Ullah Khan , Waseem Ullah , Sung Wook Baik","doi":"10.1016/j.engappai.2025.111211","DOIUrl":null,"url":null,"abstract":"<div><div>Video summarization (VS) has emerged as an effective method for extracting meaningful content from large video repositories. Recent advancements in visual intelligence have significantly improved the ability to summarize lengthy, raw videos into concise yet representative content. However, existing VS methods extract features from static GoogleNet Pool5 without empirical analysis at an early stage. Moreover, these approaches often lack mechanisms to jointly refine channel-wise and spatial-wise feature interactions, resulting in inadequate learning of complex visual semantics and ultimately suboptimal summarization performance. To address these limitations, we propose the Hybrid-Attention VS Network (HAVSNet), which conceptualizes VS as a keyframe selection task. Our method integrates representative intermediate features early in the network, significantly enhancing feature representation compared to conventional techniques. Furthermore, HAVSNet incorporates a hybrid-attention mechanism for advanced feature refinement: channel attention highlights the most discriminative feature maps, while an optimized spatial attention module captures and refines spatial dependencies. This enables the network to focus on the most informative and visually salient regions. Additionally, explainable AI (XAI) via heatmap visualizations further enhances interpretability by revealing how the model prioritizes salient regions, offering insights into the focus of the model and optimal feature selection. Extensive experiments demonstrate that our network outperforms state-of-the-art methods, achieving notable improvements. Comprehensive quantitative and qualitative analyses further confirm the effectiveness of the proposed network. Moreover, the proposed HAVSNet is evaluated across three training configurations of canonical, augmented, and transfer settings, showing its strong generalization and adaptability.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"158 ","pages":"Article 111211"},"PeriodicalIF":8.0000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal features driven hybrid attention network for effective video summarization\",\"authors\":\"Habib Khan , Samee Ullah Khan , Waseem Ullah , Sung Wook Baik\",\"doi\":\"10.1016/j.engappai.2025.111211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Video summarization (VS) has emerged as an effective method for extracting meaningful content from large video repositories. Recent advancements in visual intelligence have significantly improved the ability to summarize lengthy, raw videos into concise yet representative content. However, existing VS methods extract features from static GoogleNet Pool5 without empirical analysis at an early stage. Moreover, these approaches often lack mechanisms to jointly refine channel-wise and spatial-wise feature interactions, resulting in inadequate learning of complex visual semantics and ultimately suboptimal summarization performance. To address these limitations, we propose the Hybrid-Attention VS Network (HAVSNet), which conceptualizes VS as a keyframe selection task. Our method integrates representative intermediate features early in the network, significantly enhancing feature representation compared to conventional techniques. Furthermore, HAVSNet incorporates a hybrid-attention mechanism for advanced feature refinement: channel attention highlights the most discriminative feature maps, while an optimized spatial attention module captures and refines spatial dependencies. This enables the network to focus on the most informative and visually salient regions. Additionally, explainable AI (XAI) via heatmap visualizations further enhances interpretability by revealing how the model prioritizes salient regions, offering insights into the focus of the model and optimal feature selection. Extensive experiments demonstrate that our network outperforms state-of-the-art methods, achieving notable improvements. Comprehensive quantitative and qualitative analyses further confirm the effectiveness of the proposed network. Moreover, the proposed HAVSNet is evaluated across three training configurations of canonical, augmented, and transfer settings, showing its strong generalization and adaptability.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"158 \",\"pages\":\"Article 111211\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625012126\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625012126","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Optimal features driven hybrid attention network for effective video summarization
Video summarization (VS) has emerged as an effective method for extracting meaningful content from large video repositories. Recent advancements in visual intelligence have significantly improved the ability to summarize lengthy, raw videos into concise yet representative content. However, existing VS methods extract features from static GoogleNet Pool5 without empirical analysis at an early stage. Moreover, these approaches often lack mechanisms to jointly refine channel-wise and spatial-wise feature interactions, resulting in inadequate learning of complex visual semantics and ultimately suboptimal summarization performance. To address these limitations, we propose the Hybrid-Attention VS Network (HAVSNet), which conceptualizes VS as a keyframe selection task. Our method integrates representative intermediate features early in the network, significantly enhancing feature representation compared to conventional techniques. Furthermore, HAVSNet incorporates a hybrid-attention mechanism for advanced feature refinement: channel attention highlights the most discriminative feature maps, while an optimized spatial attention module captures and refines spatial dependencies. This enables the network to focus on the most informative and visually salient regions. Additionally, explainable AI (XAI) via heatmap visualizations further enhances interpretability by revealing how the model prioritizes salient regions, offering insights into the focus of the model and optimal feature selection. Extensive experiments demonstrate that our network outperforms state-of-the-art methods, achieving notable improvements. Comprehensive quantitative and qualitative analyses further confirm the effectiveness of the proposed network. Moreover, the proposed HAVSNet is evaluated across three training configurations of canonical, augmented, and transfer settings, showing its strong generalization and adaptability.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.