{"title":"Unsupervised Video Summarization Based on Spatiotemporal Semantic Graph and Enhanced Attention Mechanism","authors":"Xin Cheng;Lei Yang;Rui Li","doi":"10.1109/TCSS.2025.3579570","DOIUrl":null,"url":null,"abstract":"Generative adversarial networks (GANs) have demonstrated potential in enhancing keyframe selection and video reconstruction via adversarial training among unsupervised approaches. Nevertheless, GANs struggle to encapsulate the intricate spatiotemporal dynamics in videos, which is essential for producing coherent and informative summaries. To address these challenges, we introduce an unsupervised video summarization framework that synergistically integrates temporal–spatial semantic graphs (TSSGraphs) with a bilinear additive attention (BAA) mechanism. TSSGraphs are designed to effectively model temporal and spatial relationships among video frames by combining temporal convolution and dynamic edge convolution, thereby extracting salient features while mitigating model complexity. The BAA mechanism enhances the framework’s ability to capture critical motion information by addressing feature sparsity and eliminating redundant parameters, ensuring robust attention to significant motion dynamics. Experimental assessments on the SumMe and TVSum benchmark datasets reveal that our method attains improvements of up to 4.0% and 3.3% in F-score, respectively, compared to current methodologies. Moreover, our system demonstrates diminished parameter overhead throughout training and inference stages, particularly excelling in contexts with significant motion content.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"3751-3764"},"PeriodicalIF":4.5000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11077719/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
引用次数: 0
Abstract
Generative adversarial networks (GANs) have demonstrated potential in enhancing keyframe selection and video reconstruction via adversarial training among unsupervised approaches. Nevertheless, GANs struggle to encapsulate the intricate spatiotemporal dynamics in videos, which is essential for producing coherent and informative summaries. To address these challenges, we introduce an unsupervised video summarization framework that synergistically integrates temporal–spatial semantic graphs (TSSGraphs) with a bilinear additive attention (BAA) mechanism. TSSGraphs are designed to effectively model temporal and spatial relationships among video frames by combining temporal convolution and dynamic edge convolution, thereby extracting salient features while mitigating model complexity. The BAA mechanism enhances the framework’s ability to capture critical motion information by addressing feature sparsity and eliminating redundant parameters, ensuring robust attention to significant motion dynamics. Experimental assessments on the SumMe and TVSum benchmark datasets reveal that our method attains improvements of up to 4.0% and 3.3% in F-score, respectively, compared to current methodologies. Moreover, our system demonstrates diminished parameter overhead throughout training and inference stages, particularly excelling in contexts with significant motion content.
期刊介绍:
IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.