Graph convolutional network for fast video summarization in compressed domain

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-11-20 DOI:10.1016/j.neucom.2024.128945

Chia-Hung Yeh , Chih-Ming Lien , Zhi-Xiang Zhan , Feng-Hsu Tsai , Mei-Juan Chen

{"title":"Graph convolutional network for fast video summarization in compressed domain","authors":"Chia-Hung Yeh , Chih-Ming Lien , Zhi-Xiang Zhan , Feng-Hsu Tsai , Mei-Juan Chen","doi":"10.1016/j.neucom.2024.128945","DOIUrl":null,"url":null,"abstract":"<div><div>Video summarization is the process of generating a concise and representative summary of a video by selecting its most important frames. It plays a vital role in the video streaming industry, allowing users to quickly understand the overall content of a video without watching it in its entirety. Most existing video summarization methods require fully decoding the video stream and extracting the features with a pre-trained deep learning model in the pixel domain, which is time-consuming and computationally expensive. To address this issue, this paper proposes a novel method called Graph Convolutional Network-based Compressed-domain Video Summarization (GCNCVS), which directly exploits the compressed-domain information and leverages graph convolutional network to learn temporal relationships between frames, thereby enhancing its ability to capture contextual and valuable information when generating summarized videos. To evaluate the performance of GCNCVS, we conduct experiments on two benchmark datasets, SumMe and TVSum. Experimental results demonstrate that our method outperforms existing methods, achieving an average F-score of 53.5% on the SumMe dataset and 72.3% on the TVSum dataset. Additionally, the proposed method shows Kendall's τ correlation coefficient of 0.157 and Spearman's ρ correlation coefficient of 0.205 on the TVSum dataset. Our method also significantly reduces computational time, which enhances the feasibility of video summarization in video streaming environments.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"617 ","pages":"Article 128945"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224017168","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Video summarization is the process of generating a concise and representative summary of a video by selecting its most important frames. It plays a vital role in the video streaming industry, allowing users to quickly understand the overall content of a video without watching it in its entirety. Most existing video summarization methods require fully decoding the video stream and extracting the features with a pre-trained deep learning model in the pixel domain, which is time-consuming and computationally expensive. To address this issue, this paper proposes a novel method called Graph Convolutional Network-based Compressed-domain Video Summarization (GCNCVS), which directly exploits the compressed-domain information and leverages graph convolutional network to learn temporal relationships between frames, thereby enhancing its ability to capture contextual and valuable information when generating summarized videos. To evaluate the performance of GCNCVS, we conduct experiments on two benchmark datasets, SumMe and TVSum. Experimental results demonstrate that our method outperforms existing methods, achieving an average F-score of 53.5% on the SumMe dataset and 72.3% on the TVSum dataset. Additionally, the proposed method shows Kendall's τ correlation coefficient of 0.157 and Spearman's ρ correlation coefficient of 0.205 on the TVSum dataset. Our method also significantly reduces computational time, which enhances the feasibility of video summarization in video streaming environments.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.