基于语义奖励的视频摘要深度强化学习

2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C) Pub Date : 2022-12-01 DOI:10.1109/QRS-C57518.2022.00119

Haoran Sun, Xiaolong Zhu, Conghua Zhou

{"title":"基于语义奖励的视频摘要深度强化学习","authors":"Haoran Sun, Xiaolong Zhu, Conghua Zhou","doi":"10.1109/QRS-C57518.2022.00119","DOIUrl":null,"url":null,"abstract":"Video summarization aims to improve the efficiency of large-scale video browsing through producting concise summaries. It has been popular among many scenarios such as video surveillance, video review and data annotation. Traditional video summarization techniques focus on filtration in image features dimension or image semantics dimension. However, such techniques can make a large amount of possible useful information lost, especially for many videos with rich text semantics like interviews, teaching videos, in that only the information relevant to the image dimension will be retained. In order to solve the above problem, this paper considers video summarization as a continuous multi-dimensional decision-making process. Specifically, the summarization model predicts a probability for each frame and its corresponding text, and then we designs reward methods for each of them. Finally, comprehensive summaries in two dimensions, i.e. images and semantics, is generated. This approach is not only unsupervised and does not rely on labels and user interaction, but also decouples the semantic and image summarization models to provide more usable interfaces for subsequent engineering use.","PeriodicalId":183728,"journal":{"name":"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Reinforcement Learning for Video Summarization with Semantic Reward\",\"authors\":\"Haoran Sun, Xiaolong Zhu, Conghua Zhou\",\"doi\":\"10.1109/QRS-C57518.2022.00119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video summarization aims to improve the efficiency of large-scale video browsing through producting concise summaries. It has been popular among many scenarios such as video surveillance, video review and data annotation. Traditional video summarization techniques focus on filtration in image features dimension or image semantics dimension. However, such techniques can make a large amount of possible useful information lost, especially for many videos with rich text semantics like interviews, teaching videos, in that only the information relevant to the image dimension will be retained. In order to solve the above problem, this paper considers video summarization as a continuous multi-dimensional decision-making process. Specifically, the summarization model predicts a probability for each frame and its corresponding text, and then we designs reward methods for each of them. Finally, comprehensive summaries in two dimensions, i.e. images and semantics, is generated. This approach is not only unsupervised and does not rely on labels and user interaction, but also decouples the semantic and image summarization models to provide more usable interfaces for subsequent engineering use.\",\"PeriodicalId\":183728,\"journal\":{\"name\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QRS-C57518.2022.00119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS-C57518.2022.00119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

视频摘要旨在通过生成简洁的摘要来提高大规模视频浏览的效率。在视频监控、视频点评、数据标注等场景中得到广泛应用。传统的视频摘要技术侧重于图像特征维度或图像语义维度的过滤。但是，这种技术会使大量可能有用的信息丢失，特别是对于许多具有丰富文本语义的视频，如访谈、教学视频，只保留与图像维度相关的信息。为了解决上述问题，本文将视频摘要视为一个连续的多维决策过程。具体来说，摘要模型预测了每一帧及其对应文本的概率，然后为每一帧及其对应文本设计奖励方法。最后，生成图像和语义两个维度的综合总结。这种方法不仅是无监督的，不依赖于标签和用户交互，而且还解耦了语义和图像摘要模型，为后续工程使用提供了更多可用的接口。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Reinforcement Learning for Video Summarization with Semantic Reward

Video summarization aims to improve the efficiency of large-scale video browsing through producting concise summaries. It has been popular among many scenarios such as video surveillance, video review and data annotation. Traditional video summarization techniques focus on filtration in image features dimension or image semantics dimension. However, such techniques can make a large amount of possible useful information lost, especially for many videos with rich text semantics like interviews, teaching videos, in that only the information relevant to the image dimension will be retained. In order to solve the above problem, this paper considers video summarization as a continuous multi-dimensional decision-making process. Specifically, the summarization model predicts a probability for each frame and its corresponding text, and then we designs reward methods for each of them. Finally, comprehensive summaries in two dimensions, i.e. images and semantics, is generated. This approach is not only unsupervised and does not rely on labels and user interaction, but also decouples the semantic and image summarization models to provide more usable interfaces for subsequent engineering use.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)

自引率

0.00%

发文量